Materials discovery acceleration by using condition generative methodology
Abstract
With the rapid advancement of AI technologies, generative models have been increasingly employed in the exploration of novel materials. By integrating traditional computational approaches such as density functional theory (DFT) and molecular dynamics (MD), existing generative models—including diffusion models and autoregressive models—have demonstrated remarkable potential in the discovery of novel materials. However, their efficiency in goal-directed materials design remains suboptimal. In this work we developed a highly transferable, efficient and robust conditional generation framework, PODGen, by integrating a general generative model with multiple property prediction models. Based on PODGen, we designed a workflow for the high-throughput crystals conditional generation which is used to search new topological insulators (TIs). Our results show that the success rate of generating TIs using our framework is 5.3 times higher than that of the unconstrained approach. More importantly, while general methods rarely produce gapped TIs, our framework succeeds consistently—highlighting an effectively improvement. This demonstrates that conditional generation significantly enhances the efficiency of targeted material discovery. Using this method, we generated tens of thousands of new topological materials and conducted further first-principles calculations on those with promising application potential. Furthermore, we identified promising, synthesizable topological (crystalline) insulators such as , , , and .
I Introduction
New materials play a crucial role in industrial and technological fieldszhang2021dealing , offering unique properties that drive more efficient and sustainable solutionshu2021research . Crystalline materials, with their highly ordered structures, excel in areas such as electronics, optoelectronics, and medicine, providing significant support for technological advancementsdisa2021engineering ; butt2021recent ; chaudhary2022advances . However, traditional experimental and theoretical computation methods are increasingly unable to meet growing demandsagrawal2016perspective .
With the rapid advancement of AI technology over the past decade, new research paradigms have been introduced into the discovery of novel crystal materials, offering the potential to overcome the limitations of traditional methodsparadigms ; review . Well-developed predictive machine learning models have already demonstrated their ability to facilitate the rapid and accurate screening of crystal structurescgcnn ; megnet ; alignn ; dimenet ; gemnet ; coNGN ; liang2023material , assisting, accelerating, and even replacing first-principles calculationsm3gnet ; chgnet ; gnome ; liang2024cluster ; yang2024mattersim ; omat24 ; AlphaNet ; hamgnn ; hamgnn2 ; deeph2 ; deephe3 ; dpmoire . In recent years, various generative machine learning models have been applied to the exploration of new crystal structures. For example, diffusion-based modelscdvae ; diffcsp ; diffcsp++ ; joshi2025all such as CDVAE, autoregressive modelsschnet ; cifllm ; crystalformer such as CrystalFormer, flow-based modelsmiller2024flowmm ; luo2024crystalflow such as FlowMM, as well as several other modelsqiu2024vqcrystal ; sriram2024flowllm . These general generative machine learning models primarily focus on learning the distribution of crystal structures from training datasets, enabling the sampling of novel structures. Alongside these general generative models, conditional generative models have been developed to generate crystal structures tailored to specific target properties. Examples include MatterGenzeni2023mattergen , MatterGPTchen2024mattergpt , Con-CDVAEcon-cdvae , and Cond-CDVAEcond-cdvae . It is also noteworthy that recent efforts have employed reinforcement learningCFRL ; RLTI or active learningactive1 ; active2 to achieve conditional generation of crystal structures.
While some studies have explored the application of generative models in crystal structure discoverycdvae_super ; cdvae_2d ; cdvae_topo ; wyck_gen , materials with desirable physical properties and practical applications often constitute only a small fraction of known structures. In such cases, conditional generative models offer a more efficient approach than general generative models by guiding the search toward structures that meet specific criteria.
In this paper, we propose a conditional generation framework named PODGen, which means using Predictive models to Optimize the Distribution of the Generative model for conditional generation. It can be applied to various generative and predictive models, effectively improving the success rate of generation. Additionally, we have designed a workflow for high-throughput generation of crystal structures, including structure optimization, property verification, and structure deduplication. We demonstrate its application in generating topological insulators, which are crystalline materials with special electronic band structures that enable the formation of protected surface states, exhibiting unique electrical and spin-related propertiestqcwang . And 19324 topological insulators and topological crystalline insulators have been generated, with further first-principles calculations performed on promising materials with potential practical applications, which found 12 new dynamically stable (no imaginary phonon modes) crystal structures with desirable properties, among which 5 are located at the bottom of the potential energy surface (PES).
II Method

II.1 Conditional generation framework
II.1.1 Basic composition
Most widely used general generative models in the field of crystal structure generation, such as autoregressive models, diffusion models, and flow-based models, are probabilistic generative models. Fundamentally, these models generate new crystal structures by learning the distribution of crystal structures present in the training dataset. Generating structures with these models can be understood as sampling from the distribution , where represents crystal structure and is the learned distribution approximating the true distribution observed in the training data. However, when aiming to generate crystals within a specific domain, the objective shifts to sampling from the conditional distribution , where denotes the target properties characterizing the materials within that domain. And it is well knew that . Here, acts as a normalization constant that can be ignored. Therefore, sampling from the distribution can be reformulated as sampling from the distribution .
Building upon the above analysis, we developed PODGen, a highly transferable and robust conditional generation framework. This framework consists of three key components: (1) a general generative model that provides to approximate , (2) multiple predictive models that provide to approximate , and (3) an efficient sampling method—Markov Chain Monte Carlo (MCMC) sampling. The fundamental steps of this framework are illustrated in FIG. 1.
In this framework, the generative model only needs to provide probabilistic estimates of crystal structures, without restrictions on the specific model type. Additionally, most widely used predictive models can be seamlessly integrated into this framework. For classification-based predictive models, which commonly employ cross-entropy loss, inherently yield probability estimates for each structural class; whereas for regression-based predictive models which commonly use the Mean Squared Error (MSE) as their loss function, which corresponds to a probabilistic model assuming that the observed data follow a Gaussian distribution centered around the predicted value with a fixed variance. Therefore, most predictive models can provide the probability .
II.1.2 Crystal generation
MCMC sampling is an efficient method for sampling from complex high-dimensional distributions. Similar method has been used in language model for generating sentences that satisfy certain conditionsmiao2019cgmh ; zhang2020language , or optimizing the resultsong2025llmfeynmanleveraginglargelanguage . MCMC generates a sequence of correlated samples by iteratively transitioning from one state to another based on the transition matrix of a Markov chain. In the context of crystal structure generation, each state corresponds to a specific crystal structurecrystalformer . The Metropolis-Hastings (MH) algorithm enables efficient computation of transition probabilities in a Markov chainHM_M ; HM_H . This algorithm proposes potential new states based on a designed update strategy and then accepts the transition with a probability given by Eq. 1 Eq. 2. The detailed balance condition established in this way ensures that the samples obtained through MCMC conform to the target distribution.
(1) | |||
(2) |
Here, represents the target probability distribution, denotes the probability of proposing a transition from crystal structure to crystal structure , and is the acceptance probability for this proposed transition. If the proposal is accepted, then = ; otherwise, = . In this study, we applied this conditional generation framework to the generation of topological insulators, where is specifically defined as:
(3) | |||
(4) |
In this paper we train a CrystalFormercrystalformer as the general generation model to provide the , and three classification modeldimenet to provide where TI stands for topological insulator, NMet stands for non-metal, and NMag stands for non-magnetic. Since existing rapid identification toolshe2019symtopo ; tqcmethod for crystal topological properties are all symmetry-based, we selected CrystalFormer, which inherently encodes space group and Wyckoff position information within it. In contrast, the predictive models were not meticulously curated or extensively trained, further demonstrating the robustness of our framework. For more information about these model, please refer to Supplementary Section S1. And thought the analysis of the databasezhang2019catalogue ; tqc , it is not difficult to find that topological insulators are more likely to be found in crystals containing these elements . Therefore, we introduce to modify the crystal probability , aiming to generate crystals containing these elements with a higher probability.
We employ three types of proposals, each corresponding to modifications in atomic species, atomic coordinates, and lattice constants. At each step of the Markov chain, one of these proposals is randomly selected with probabilities of 0.2, 0.4, and 0.4, respectively. When modifying atomic species, we first select a Wyckoff position from the current configuration with equal probability and then replace the atomic species at that position with a randomly chosen element. For atomic coordinate modifications, we apply Gaussian noise to the fractional coordinates of all atoms with degrees of freedom, while respecting Wyckoff position constraints, which may prevent certain atomic coordinates from being altered. Similarly, when modifying lattice constants, Gaussian noise is added to all adjustable lattice parameters, subject to space group constraints, which may restrict changes to certain lattice constants.

To prevent the generated structures from being confined to known regions of configuration space, we incorporate a simulated annealing approach with ten temperature levels ranging from T = 10 to T = 1. The process begins by randomly selecting a crystal structure from the Alexand20schmidt2022dataset ; schmidt2022large (refer to Supplementary Section S1) as the initial state. Starting from T = 10, we allow the system to equilibrate at each temperature before gradually cooling to the next level, continuing until convergence is reached at T = 1, at which point sampling is performed. Convergence at each temperature is determined based on the rolling window mean and standard deviation of , with a window size of 200 steps and a tolerance of 1e-3. Fig. 2 presents the convergence curves at T = 10 and T = 1. During the sampling phase, we record a sample every 100 Markov steps.
II.2 Crystal generation workflow

We have designed a workflow for high-throughput generation of crystal structures in specific domains, as illustrated in FIG. 3. This workflow encompasses conditional crystal structure generation, machine learning force field relaxation, crystal property evaluation, and structure deduplication. We applied this workflow to the generation of topological insulators and further validated the most promising candidate materials through first-principles calculations.
The workflow integrates our conditional generation framework PODGen, a general machine learning force field (MLFF) OpenLAMpeng2025openlam , a symmetry-based topological classification tool Symtopohe2019symtopo , and first-principles calculation tools such as VASPkresse1996efficiency , along with software packages including pymatgenong2013python , ASElarsen2017atomic , VASPkitwang2021vaspkit , and Phonopyphonopy-phono3py-JPCM ; phonopy-phono3py-JPSJ . This workflow is transferable to the exploration of other condition-dependent crystal materials by simply replacing the corresponding prediction models and crystal property evaluation tools.
In this high-throughput generation workflow we first use MLFF model to relax generated structure. The MLFF relaxation closely approximates first-principles results while being approximately three orders of magnitude more efficient. And we employed the OpenLAM model released in October 2024peng2025openlam . Then we use SymTopohe2019symtopo to help us quickly verify the topological properties of new crystals. SymTopo is an automated tool for calculating the topological properties of nonmagnetic crystalline materials. At last, we use module from pymatgenong2013python to determine whether two structures are similar. The scope of duplicate checking is the combined datasetheyu of Materiaezhang2019catalogue and TQCtqc , as well as the newly generated crystals. For more details, please refer to Supplementary Section S2.
II.3 Promising Crystals Validation
For structures with a direct band gap that are classified as topological insulators or topological crystalline insulators after MLFF relaxation, we further verify their properties and stability using first-principles calculations. This process includes DFT relaxation, reconfirmation of topological classification using SymTopo, and phonon spectrum analysis. To ensure computational efficiency and reproducibility, we primarily use pymatgen and VASPkit to generate input files for VASP, with all calculations performed using VASP 5.4.4.
During the relaxation process, we divide it into two steps. In the first step, we use VASP input files generated by pymatgen for relaxation. However, since the default convergence criteria in pymatgen are not stringent enough (EDIFF is typically set on the order of 1e-3), we refine the relaxation in a second step. Once the first relaxation succeeds, we modify EDIFF to 1e-5 and EDIFFG to -1e-3 in the second step, ensuring that the atomic forces are reduced to below 1e-3 eV/Å.
If both relaxation steps converge, we use SymTopo to reassess the topological properties and band gap of the relaxed structure. For structures that remain topological insulators with a direct band gap, we further compute their electronic band structure and phonon spectrum. For band structure calculations, we directly use the CHGCAR and fermi energy obtained from SymTopo’s SCF calculation, along with the high-symmetry path generated by VASPkit, to plot the band structure. For phonon spectrum calculations, we employ a 2×2×2 supercell and the DFPT method. Except for setting ENCUT to 1.3 times ENMAX, all other parameters are generated by VASPkit. Most DFT calculations in this paper use the PBE exchange-correlation functionalPBE , and some of the results obtained using the HSE06 functionalhse06 can be found in the Supplementary Information.
To further evaluate the synthesizability of these structures, we employed the Stochastic Surface Walking (SSW) methodssw ; sswcrystal to explore their potential energy surfaces. This not only allowed us to determine the locations of the generated crystal structures on the PES, but also enabled the discovery of more stable configurations.
III Result
III.1 Topological Material Condition Generation

Method | TI | TCI | TI:TCI |
---|---|---|---|
General generation | 2.85% | 2.45% | 1.16:1 |
Conditional generation | 15.25% | 9.93% | 1.62:1 |
Using the method mentioned before, we have generated 84726 crystal, 78110 of them can be successfully relaxed by OpenLAM with maximum atomic force falls below 0.02 eV/Å and predicted formation energy smaller then 1.0 eV/atom. Then 78575 of them can get the topological classification given by Symtopo. After removing the duplicate structures, there are 11914 unique crystals classified as TI and 7336 unique crystals classified as TCI, corresponding to proportions of 15.25% and 9.93%. Here, 68 TIs are considered to have direct band gaps, among which 63 are also regarded as having indirect band gaps. Among the 36 TCIs considered to have direct band gaps, 34 are also regarded as having indirect band gaps.
We also explored the direct generation of crystal structures using CrystalFormer for topological material screening. Among the 2000 generated materials, only 57 were identified as TI and 49 as TCI, corresponding to probabilities of 2.85% and 2.45%, respectively. This generation efficiency is significantly lower than that achieved through conditional generation. More importantly, no gapped TIs were found among them. Furthermore, we observed that in the absence of conditional generation, the ratio of TI to TCI was 1.16:1. However, when generating materials conditioned on TI (excluding TCI), this ratio increased to 1.62:1. As shown in Supplementary Table S1, the topological classification model we employed is a relatively basic one, and TI and TCI are known to be categories that are prone to misclassification by predictive models. Nevertheless, our conditional generation framework significantly improves both the success rate and the proportion of materials with the desired topological properties, demonstrating its robustness. We believe that employing a state-of-the-art (SOTA) predictive model with refined training will further enhance generation efficiency.
We conducted a statistical analysis of the 19,250 generated TI and TCI materials. FIG. 4(a) shows the occurrence frequency of each element in these materials. Compared to the CrystalFormer training set (Supplementary Fig. S2), the elemental distribution of the generated crystals has been substantially altered, resembling more closely the distribution of topological insulators in the existing databaseheyu (Supplementary Fig. S4). This indicates that our conditional generation framework effectively adjusts the baseline distribution of CrystalFormer toward the target distribution characteristic of topological materials. Further analysis reveals that, although elements such as B and Ge maintain high occurrence frequencies similar to those in the existing database, the most prevalent element shifted from O in the original database to H in the newly generated materials. This shift demonstrates that our framework not only aligns with the existing distribution but also explores new compositional spaces beyond the limitations of the original datasets, leading to the discovery of novel crystal structures.
FIG. 4(b) presents the formation energy distribution of the 19,250 generated TI and TCI materials, with formation energies predicted concurrently during relaxation using OpenLAM. Although the distribution does not fully align with the ideal scenario where all formation energies are negative, it closely resembles the formation energy distribution of the CrystalFormer training set (Supplementary Fig. S3). This observation suggests that for properties not explicitly constrained by our conditional generation framework—such as formation energy, which has little direct correlation with topological properties—the generated crystal structures largely adhere to the inherent distribution of the base model.
III.2 Promising Materials Validation

Although materials without a band gap can still be identified as TI through symmetry-based topological classification methodshe2019symtopo ; tqcmethod , gapped TI are generally considered more promising for practical applications. Therefore, from the generated TI and TCI materials, we selected 104 candidates with a direct band gap for further validation through first-principles calculations. The verification process is outlined in FIG.4(c). Among these materials, 88 were successfully relaxed, and 50 retained their direct band gap as TI or TCI after relaxation. Notably, 12 of these materials exhibited phonon spectra without imaginary frequencies. The crystal structures, SymTopo classification results, topological indices, band structures, and phonon spectra of these 12 materials are presented in FIG. 5 and Supplementary Fig. S6.

We applied the SSW method to explore the PES of this 12 materials, in order to further evaluate their experimental synthesizability. As shown in FIG. 6, these five materials are located at the bottom of their PES and exhibit negative energy above hull (as shown in FIG. 5 and Supplementary Fig. S6), indicating a higher likelihood of experimental realization. The PES landscapes of the remaining materials are shown in Supplementary Fig. S7.
III.3 WannierTools confirmation

To further validate our results, we selected several materials and constructed Wannier tight-binding modelstight1 ; tight2 ; tight3 using Wannier90wannier90 . We then used the WannierToolswu2018wanniertools package to calculate surface states and the Wilson loopsz21 ; z22 ; yu2011equivalent based on these Wannier models. In our analysis, open boundary conditions were imposed along different crystallographic directions: FIG. 7(a) presents the boundary-state spectrum of , with an open boundary on the (010) surfac. whereas FIG. 7(b) and FIG. 7(c) show the spectra of and , respectively, with an open boundary on the (100) surface, and FIG. 7(d) displays with an open boundary on the (010) surface. As illustrated in the figures, pronounced in-gap boundary states are clearly observed, indicating that these materials are topological (Crystalline) insulators. Detailed Wilson loop calculations can be found in Supplementary Fig. S5.
IV Conclusion
Crystal structure generation models are powerful tools for discovering novel crystalline materials. However, when searching for structures with specific properties, conditional generation methods can significantly enhance efficiency. In this study, we developed a highly transferable and robust conditional generation framework PODGen by integrating the general crystal generation model, with multiple property prediction models. A generative model capable of providing crystal structure probabilities and most existing predictive models can be seamlessly incorporated into this framework, which imposes minimal requirements on their predictive capabilities. Moreover, once the base model is trained, conditional generation can be performed simply by training an appropriate predictive model for any specific domain. This significantly reduces the dependence on large domain-specific datasets and lowers training costs.
For properties explicitly conditioned by a predictive model (or those with strong correlations), our approach effectively guides the base model—originally trained to follow the distribution of the training dataset—toward generating structures that conform to a desired target distribution. Conversely, for properties without an associated predictive model (or those with weak correlations), the generated structures continue to follow the original training distribution.
We applied this framework to the conditional generation of topological insulator materials, achieving a success rate 5.35 times higher than that of conventional generation models. More importantly, the stricter the property constraints on the generated crystals, the greater the advantage of our framework over general generative models. For example, in generating gapped TIs, our framework achieves success where general methods almost entirely fail—representing an effectively improvement. Using this method, we generated over 80,000 structures, nearly 20,000 of which were identified as TI or TCI. Further first-principles calculations were performed on the subset with direct band gaps, leading to the identification of 12 materials with promising application potential. Five of these structures are located near the global minima of the PES, suggesting a higher likelihood of experimental synthesis. Furthermore, we used WannierTools to further verify our results.
Certainly, there remains room for improvement in our framework. We adopted CrystalFormer as the base model; however, during the MCMC state updates, we only modified atomic species, atomic positions, and lattice constants, while leaving Wyckoff positions and space groups unchanged. This limitation arises because, in the string-based crystal structure representation used by CrystalFormer, Wyckoff positions are interdependent, requiring a more sophisticated update strategy. Additionally, modifications to the space group would fundamentally alter the entire structural representation of CrystalFormer.
V Code and Data
Our code is available at http://github.com/cyye001/PODGen. And the dataset of generated crystals will be shown on a website of the Condensed Matter Physics Data Center of Chinese Academy of Sciences http://cmpdc.iphy.ac.cn/materialsgalaxy/#/services/materials and can be downloaded in Electronic Laboratory for Material Science http://in.iphy.ac.cn/eln/link.html#/113/G9f5.
References
- (1) Zhang, D. et al. Dealing with the foreign-body response to implanted biomaterials: strategies and applications of new materials. Advanced Functional Materials 31, 2007226 (2021).
- (2) Hu, X., Deng, Z., Lin, X., Xie, Y. & Teodorescu, R. Research directions for next-generation battery management solutions in automotive applications. Renewable and Sustainable Energy Reviews 152, 111695 (2021).
- (3) Disa, A. S., Nova, T. F. & Cavalleri, A. Engineering crystal structures with light. Nature Physics 17, 1087–1092 (2021).
- (4) Butt, M., Khonina, S. N. & Kazanskiy, N. Recent advances in photonic crystal optical devices: A review. Optics & laser technology 142, 107265 (2021).
- (5) Chaudhary, V. S., Kumar, D., Pandey, B. P. & Kumar, S. Advances in photonic crystal fiber-based sensor for detection of physical and biochemical parameters—a review. IEEE sensors journal 23, 1012–1023 (2022).
- (6) Agrawal, A. & Choudhary, A. Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science. Apl Materials 4 (2016).
- (7) Fang, J. et al. Machine learning accelerates the materials discovery. Materials Today Communications 33, 104900 (2022).
- (8) Wang, Z., Hua, H., Lin, W., Yang, M. & Tan, K. C. Crystalline material discovery in the era of artificial intelligence (2025). URL http://arxiv.org/abs/2408.08044. eprint 2408.08044.
- (9) Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters 120, 145301 (2018).
- (10) Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials 31, 3564–3572 (2019).
- (11) Choudhary, K. & DeCost, B. Atomistic line graph neural network for improved materials property predictions. npj Computational Materials 7, 185 (2021).
- (12) Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. arXiv preprint arXiv:2011.14115 (2020).
- (13) Gasteiger, J., Becker, F. & Günnemann, S. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems 34, 6790–6802 (2021).
- (14) Ruff, R., Reiser, P., Stühmer, J. & Friederich, P. Connectivity optimized nested line graph networks for crystal structures. Digital Discovery 3, 594–601 (2024).
- (15) Liang, C. et al. Material symmetry recognition and property prediction accomplished by crystal capsule representation. Nature Communications 14, 5198 (2023).
- (16) Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science 2, 718–728 (2022).
- (17) Deng, B. et al. Chgnet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence 5, 1031–1041 (2023).
- (18) Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
- (19) Liang, C. et al. A cluster-based deep learning model perceiving series correlation for accurate prediction of phonon spectrum. Advanced Science 11, 2406183 (2024).
- (20) Yang, H. et al. Mattersim: A deep learning atomistic model across elements, temperatures and pressures. arXiv preprint arXiv:2405.04967 (2024).
- (21) Barroso-Luque, L. et al. Open materials 2024 (omat24) inorganic materials dataset and models. arXiv preprint arXiv:2410.12771 (2024).
- (22) Yin, B. et al. Alphanet: Scaling up local frame-based atomistic foundation model. arXiv preprint arXiv:2501.07155 (2025).
- (23) Zhong, Y., Yu, H., Su, M., Gong, X. & Xiang, H. Transferable equivariant graph neural networks for the hamiltonians of molecules and solids. npj Computational Materials 9, 182 (2023).
- (24) Zhong, Y. et al. Universal machine learning kohn–sham hamiltonian for materials. Chinese Physics Letters 41, 077103 (2024).
- (25) Wang, Y. et al. Deeph-2: enhancing deep-learning electronic structure via an equivariant local-coordinate transformer. arXiv preprint arXiv:2401.17015 (2024).
- (26) Gong, X. et al. General framework for e (3)-equivariant neural network representation of density functional theory hamiltonian. Nature Communications 14, 2848 (2023).
- (27) Liu, J., Fang, Z., Weng, H. & Wu, Q. Dpmoire: A tool for constructing accurate machine learning force fields in moiré systems (2025). URL http://arxiv.org/abs/2412.19333. eprint 2412.19333.
- (28) Xie, T., Fu, X., Ganea, O.-E., Barzilay, R. & Jaakkola, T. Crystal diffusion variational autoencoder for periodic material generation. arXiv preprint arXiv:2110.06197 (2021).
- (29) Jiao, R. et al. Crystal structure prediction by joint equivariant diffusion. Advances in Neural Information Processing Systems 36, 17464–17497 (2023).
- (30) Jiao, R., Huang, W., Liu, Y., Zhao, D. & Liu, Y. Space group constrained crystal generation. arXiv preprint arXiv:2402.03992 (2024).
- (31) Joshi, C. K. et al. All-atom diffusion transformers: Unified generative modelling of molecules and materials. arXiv preprint arXiv:2503.03965 (2025).
- (32) Gebauer, N., Gastegger, M. & Schütt, K. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. Advances in neural information processing systems 32 (2019).
- (33) Flam-Shepherd, D. & Aspuru-Guzik, A. Language models can generate molecules, materials, and protein binding sites directly in three dimensions as xyz, cif, and pdb files. arXiv preprint arXiv:2305.05708 (2023).
- (34) Cao, Z., Luo, X., Lv, J. & Wang, L. Space group informed transformer for crystalline materials generation. arXiv preprint arXiv:2403.15734 (2024).
- (35) Miller, B. K., Chen, R. T., Sriram, A. & Wood, B. M. Flowmm: Generating materials with riemannian flow matching. In Forty-first International Conference on Machine Learning (2024).
- (36) Luo, X. et al. Crystalflow: A flow-based generative model for crystalline materials. arXiv preprint arXiv:2412.11693 (2024).
- (37) Qiu, Z. et al. Vqcrystal: Leveraging vector quantization for discovery of stable crystal structures. arXiv preprint arXiv:2409.06191 (2024).
- (38) Sriram, A., Miller, B., Chen, R. T. & Wood, B. Flowllm: Flow matching for material generation with large language models as base distributions. Advances in Neural Information Processing Systems 37, 46025–46046 (2024).
- (39) Zeni, C. et al. A generative model for inorganic materials design. Nature 1–3 (2025).
- (40) Chen, Y. et al. Mattergpt: A generative transformer for multi-property inverse design of solid-state materials. arXiv preprint arXiv:2408.07608 (2024).
- (41) Ye, C.-Y., Weng, H.-M. & Wu, Q.-S. Con-cdvae: A method for the conditional generation of crystal structures. Computational Materials Today 1, 100003 (2024).
- (42) Luo, X. et al. Deep learning generative model for crystal structure prediction. npj Computational Materials 10, 254 (2024).
- (43) Cao, Z. & Wang, L. Crystalformer-rl: Reinforcement fine-tuning for materials design (2025). URL http://arxiv.org/abs/2504.02367. eprint 2504.02367.
- (44) Xu, H., Qian, D., Liu, Z., Jiang, Y. & Wang, J. Design topological materials by reinforcement fine-tuned generative model (2025). URL http://arxiv.org/abs/2504.13048. eprint 2504.13048.
- (45) Li, Z., Liu, S., Ye, B., Srolovitz, D. J. & Wen, T. Active learning for conditional inverse design with crystal generation and foundation atomic models (2025). URL http://arxiv.org/abs/2502.16984. eprint 2502.16984.
- (46) Han, X.-Q. et al. Invdesflow: An ai-driven materials inverse design workflow to explore possible high-temperature superconductors. Chinese Physics Letters (2025). URL http://iopscience.iop.org/article/10.1088/0256-307X/42/4/047301.
- (47) Choudhary, K. & Garrity, K. Designing high-tc superconductors with bcs-inspired screening, density functional theory, and deep-learning. npj Computational Materials 8, 244 (2022).
- (48) Lyngby, P. & Thygesen, K. S. Data-driven discovery of 2d materials by deep generative models. npj Computational Materials 8, 232 (2022).
- (49) Hong, T. et al. Discovery of new topological insulators and semimetals using deep generative models. npj Quantum Materials 10, 12 (2025).
- (50) Yamazaki, S. et al. Multi-property directed generative design of inorganic materials through wyckoff-augmented transfer learning (2025). URL http://arxiv.org/abs/2503.16784. eprint 2503.16784.
- (51) Bradlyn, B. et al. Topological quantum chemistry. Nature 547, 298–305 (2017).
- (52) Miao, N., Zhou, H., Mou, L., Yan, R. & Li, L. Cgmh: Constrained sentence generation by metropolis-hastings sampling. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 6834–6842 (2019).
- (53) Zhang, M., Jiang, N., Li, L. & Xue, Y. Language generation via combinatorial constraint satisfaction: A tree search enhanced monte-carlo approach. arXiv preprint arXiv:2011.12334 (2020).
- (54) Song, Z. et al. Llm-feynman: Leveraging large language models for universal scientific formula and theory discovery (2025). URL http://arxiv.org/abs/2503.06512. eprint 2503.06512.
- (55) Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. The journal of chemical physics 21, 1087–1092 (1953).
- (56) Hastings, W. K. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 97–109 (1970). URL http://doi.org/10.1093/biomet/57.1.97.
- (57) He, Y. et al. Symtopo: An automatic tool for calculating topological properties of nonmagnetic crystalline materials. Chinese Physics B 28, 087102 (2019).
- (58) Vergniory, M. et al. A complete catalogue of high-quality topological materials. Nature 566, 480–485 (2019).
- (59) Zhang, T. et al. Catalogue of topological electronic materials. Nature 566, 475–479 (2019).
- (60) Vergniory, M. G. et al. All topological bands of all nonmagnetic stoichiometric materials. Science 376, eabg9094 (2022).
- (61) Schmidt, J., Wang, H.-C., Cerqueira, T. F., Botti, S. & Marques, M. A. A dataset of 175k stable and metastable materials calculated with the pbesol and scan functionals. Scientific Data 9, 64 (2022).
- (62) Schmidt, J. et al. Large-scale machine-learning-assisted exploration of the whole materials space. arXiv preprint arXiv:2210.00579 (2022).
- (63) Peng, A., Liu, X., Guo, M.-Y., Zhang, L. & Wang, H. The openlam challenges. arXiv preprint arXiv:2501.16358 (2025).
- (64) Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Computational materials science 6, 15–50 (1996).
- (65) Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science 68, 314–319 (2013).
- (66) Larsen, A. H. et al. The atomic simulation environment—a python library for working with atoms. Journal of Physics: Condensed Matter 29, 273002 (2017).
- (67) Wang, V., Xu, N., Liu, J.-C., Tang, G. & Geng, W.-T. Vaspkit: A user-friendly interface facilitating high-throughput computing and analysis using vasp code. Computer Physics Communications 267, 108033 (2021).
- (68) Togo, A., Chaput, L., Tadano, T. & Tanaka, I. Implementation strategies in phonopy and phono3py. J. Phys. Condens. Matter 35, 353001 (2023).
- (69) Togo, A. First-principles phonon calculations with phonopy and phono3py. J. Phys. Soc. Jpn. 92, 012001 (2023).
- (70) He, Y. Machine Learning topological characteristics from multiple electronic materials databases. Ph.D. thesis, UCL-Université Catholique de Louvain (2023).
- (71) Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Physical review letters 77, 3865 (1996).
- (72) Peralta, J. E., Heyd, J., Scuseria, G. E. & Martin, R. L. Spin-orbit splittings and energy band gaps calculated with the heyd-scuseria-ernzerhof screened hybrid functional. Physical Review B—Condensed Matter and Materials Physics 74, 073101 (2006).
- (73) Shang, C. & Liu, Z.-P. Stochastic surface walking method for structure prediction and pathway searching. Journal of Chemical Theory and Computation 9, 1838–1845 (2013).
- (74) Shang, C., Zhang, X.-J. & Liu, Z.-P. Stochastic surface walking method for crystal structure and phase transition pathway prediction. Physical Chemistry Chemical Physics 16, 17845–17856 (2014).
- (75) Marzari, N. & Vanderbilt, D. Maximally localized generalized wannier functions for composite energy bands. Physical review B 56, 12847 (1997).
- (76) Souza, I., Marzari, N. & Vanderbilt, D. Maximally localized wannier functions for entangled energy bands. Physical Review B 65, 035109 (2001).
- (77) Marzari, N., Mostofi, A. A., Yates, J. R., Souza, I. & Vanderbilt, D. Maximally localized wannier functions: Theory and applications. Reviews of Modern Physics 84, 1419–1475 (2012).
- (78) Mostofi, A. A. et al. An updated version of wannier90: A tool for obtaining maximally-localised wannier functions. Computer Physics Communications 185, 2309–2310 (2014).
- (79) Wu, Q., Zhang, S., Song, H.-F., Troyer, M. & Soluyanov, A. A. Wanniertools: An open-source software package for novel topological materials. Computer Physics Communications 224, 405–416 (2018).
- (80) Fu, L. & Kane, C. L. Topological insulators with inversion symmetry. Physical Review B—Condensed Matter and Materials Physics 76, 045302 (2007).
- (81) Fu, L., Kane, C. L. & Mele, E. J. Topological insulators in three dimensions. Physical review letters 98, 106803 (2007).
- (82) Yu, R., Qi, X. L., Bernevig, A., Fang, Z. & Dai, X. Equivalent expression of z 2 topological invariant for band insulators using the non-abelian berry connection. Physical Review B—Condensed Matter and Materials Physics 84, 075119 (2011).
Acknowledgements
We thank Shigang Ou, Ruihan Zhang, Jingyu Yao, Yue Xie, Yi Yan, Yuanchen Shen for useful discussions. This work was supported by the Science Center of the National Natural Science Foundation of China (Grant No. 12188101), the National Key Research and Development Program of China (Grant No. 2023YFA1607400, 2022YFA1403800), the National Natural Science Foundation of China (Grant No.12274436, 11921004), and H.W. acknowledge support from the New Cornerstone Science Foundation through the XPLORER PRIZE.
Author contributions
C.Y., H.W. and Q.W. conceived the idea and performed the analysis. C.Y. developed and implemented the PODGen framework, designed and executed the crystal generation workflow, and conducted the DFT, SymTopo, and DFPT calculations. Y.W. performed the WannierTools calculations. X.X. and Z.L. carried out the SSW calculations. T.Z. presented the results on the website. Y.H. provided the topological materials dataset and performed data cleaning. All authors contributed to the interpretation of the results and the writing of the manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information
Correspondence and requests for materials should be addressed to Quansheng Wu.