Abstract Summary/Description
Molecular property prediction is pivotal in drug discovery, enabling accurate identification of potential compounds with desired characteristics. We introduce MolHyGAN, a novel hypergraph attention network model that captures higher-order molecular interactions for enhanced predictive capabilities. MolHyGAN represents molecules as hypergraphs, where nodes correspond to molecular substructures extracted using Extended Connectivity Fingerprints (ECFP) or k-mer sequences, and hyperedges represent the molecules themselves. This hypergraph-based representation allows the model to effectively capture complex molecular relationships that are often overlooked in traditional graph neural networks. Key innovations in MolHyGAN include attention mechanisms that prioritize important molecular substructures, stratified scaffold splitting for robust generalization, and comprehensive experiments across benchmark datasets (BACE, BBBP, ClinTox, and SIDER). By leveraging ECFP with varying radii (R = 2, 4, 6) and k-mer lengths (K = 3, 5, 7), MolHyGAN demonstrates significant improvements in AUC-ROC scores, achieving state-of-the-art results, particularly on balanced datasets. For example, MolHyGAN achieved an AUC-ROC of 0.9752 on the BACE dataset and 0.9939 on BBBP, outperforming existing methods. This study highlights MolHyGAN’s potential to transform molecular property prediction, particularly for applications in drug discovery and chemical research. Future extensions will explore hybrid approaches that integrate additional molecular fingerprints and embeddings to further enhance performance.