\useunder
Lai Wei, Sadman Sadeed Omee
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29201
\AndRongzhi Dong, Nihang Fu
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29201
\AndYuqi Song
Department of Computer Science
University of Southern Maine
Portland, Maine, 04101
\AndEdirisuriya M. D. Siriwardane
Department of Physics
University of Colombo
Sri Lanka
\AndMeiling Xu
School of Physics and Electronic Engineering
Jiangsu Normal University
Xuzhou, China
\AndChris Wolverton
Department of Materials Science and Engineering
Northwestern University
Chicago, USA
\AndJianjun Hu*
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29201
jianjunh@cse.sc.edu
Abstract
Crystal structure prediction (CSP) is now increasingly used in discovering novel materials with applications in diverse industries. However, despite decades of developments and significant progress in this area, there lacks a set of welldefined benchmark dataset, quantitative performance metrics, and studies that evaluate the status of the field. We aim to fill this gap by introducing a CSP benchmark suite with 180 test structures along with our recently implemented CSP performance metric set. We benchmark a collection of 13 stateoftheart (SOTA) CSP algorithms including templatebased CSP algorithms, conventional CSP algorithms based on DFT calculations and global search such as CALYPSO, CSP algorithms based on machine learning (ML) potentials and global search, and distance matrix based CSP algorithms. Our results demonstrate that the performance of the current CSP algorithms is far from being satisfactory. Most algorithms cannot even identify the structures with the correct space groups except for the templatebased algorithms when applied to test structures with similar templates. We also find that the ML potential based CSP algorithms are now able to achieve competitive performances compared to the DFTbased algorithms. These CSP algorithms’ performance is strongly determined by the quality of the neural potentials as well as the global optimization algorithms. Our benchmark suite comes with a comprehensive opensource codebase and 180 wellselected benchmark crystal structures, making it convenient to evaluate the advantages and disadvantages of CSP algorithms from future studies. All the code and benchmark data are available at https://github.com/usccolumbia/cspbenchmark
Keywords crystal structure prediction $\cdot$materials discovery $\cdot$benchmark $\cdot$neural network potential $\cdot$deep learning
1 Introduction
The critical assessment of protein structure prediction (CASP) and advancements like Alphafold have significantly propelled research in predicting protein structures [1, 2]. Similarly, crystal structure prediction (CSP) methods have gained attention for organic molecules [3]. However, the focus on crystal structure prediction within the domain of inorganic materials is steadily growing and proving to be vital for discovering new materials across diverse industries. Understanding the crystal structure of a material holds immense importance as it significantly influences its physical, chemical, and mechanical properties. Traditionally, experimental techniques such as Density Functional Theory (DFT) calculations, coupled with global search algorithms or tailored experiments, have been utilized to determine crystal structures. While successful for many materials, these methods are often timeconsuming, costly, and particularly challenging when dealing with novel or intricate compounds. Therefore, the applications and quantitative metrics within CSP are becoming increasingly indispensable for advancing research in inorganic materials and guiding their practical utilization.
Nowadays, a plethora of approaches for crystal structure prediction exists, including evolutionary algorithms [4, 5, 6], data mining [7, 8] and machine learning [9, 10]. Current methods of evaluating structures are mainly based on manual structural inspection, comparison with experimentally observed structures, comparison of energy or enthalpy values, success rate analysis, and computation of distances between structures. Nevertheless, the absence of a quantitative approach for evaluating predicted structures remains a challenge, hindering the ability to confidently ascertain their reliability and guide experimental validation. As the field progresses, developing robust quantitative evaluation methods is essential to unlock new frontiers in materials research and development. In this paper, we conducted a large scale benchmark study on the main CSP algorithms selected from Table 1, including CSP algorithms with ML potentials including ab initio CSP [11, 12], GNOA (Graph Networks for crystal structure Optimization with Atomistic potentials) with MLpotentials [9], AGOX with M3GNet potential [13], Random structure search in Atomistic Global Optimization X (AGOX), Basinhopping [14], Parallel tempering, Local GPR basinhopping [15] , Evolutionary algorithm, the Bayesian Optimization GOFEE [16], templatebased CSP [17, 18], DLbased CSP algorithms. Ab initio methods involve the calculation of the electronic structure and total energy of a crystal system using quantum mechanical principles. CrySPY and XtalOpt are two widely used algorithms for crystal structure prediction based on the ab initio approach that employ optimization algorithms to search for stable crystal structures with low energies. AGOX employs a machinelearned potential called M3GNet, which is designed to accurately describe the energy landscape of materials, explore the vast configuration space of crystal structures, and identify stable candidates. The Basinhopping algorithm involves a stochastic exploration of the potential energy landscape, efficiently searching for the global minimum by jumping between these basins (local energy minima). DLbased crystal structure prediction algorithms utilize neural networks and deep learning techniques to learn representations of atomic structures and predict their energies or stability. GNOA is an algorithm that utilizes machine learning potentials to predict crystal structures. It employs graph networks, which can represent atomic structures efficiently, and atomistic potentials learned from data to optimize the crystal structures. We also compared the performances of the nonDFTbased CSP algorithms with the leading DFTbased CSP algorithms including CALYPSO [19] and USPEX [4].We then analyze and evaluate these CSP algorithms and utilize them to generate target structures. By calculating metrics through our quantitative evaluation method, we can examine the performance of each algorithm.In our paper, we conduct a comprehensive analysis and evaluation of leading crystal structure prediction (CSP) algorithms. To assess their performance, we calculate a set of quantitative metrics for CSP benchmarking, which serve as objective measures to gauge the algorithms’ accuracy, efficiency, and reliability in predicting crystal structures. Through this evaluation process, we aim to provide a clear picture of the capabilities of each algorithm and how they fare in comparison to one another. The results obtained from the benchmarking analysis shed light on their performance in terms of predicting known crystal structures and identifying novel structures not present in the training dataset. Our quantitative evaluation metrics ensure that the analysis is conducted in a fair and unbiased manner, allowing for meaningful comparisons. Overall, this thorough analysis and evaluation are fundamental to advancing the stateoftheart in CSP and accelerating the discovery of new materials with tailored properties for diverse applications.
Another category of modern CSP algorithms combines global search with machine learning potential functions for structure search. In early attempts, these algorithms mainly used specialized ML potential functions that only covered one or a few element types: in [20], ML potentials for four systems: Al, C, He and Xe were trained for CSP using the USPEX algorithm. A followup study [21] used active learning to develop a ML potential and apply it to the CSP of carbon allotropes, sodium structures under pressure, and boron allotropes. CALYPSO algorithm has also been combined with machine learning potentials for structure prediction of Boron (B) clusters [22], 24atom cubic boron phases [23] and gallium nitride (GaN) phase simulation with 4096atom [24]. However, all these ML potentials are not universal enough to cover elements of the whole or a majority of the periodic table.
Recent ML potentials have been developed to cover a large portion of the periodic table. Takamoto et al. [25] developed TeaNet, a 16layer graph convolution network with a residual network (ResNet) architecture and recurrent GCN weights initialization, for the simulation of metals, and amorphous SiO_{2} structures. Their universal model can cover 18 elements initially and was later extended to 45 elements [26]. Their neural network potential has been shown to speed up the simulation of lithium diffusion in LiFeSO_{4}F, molecular adsorption in metalorganic frameworks, an order–disorder transition of CuAu alloys, and material discovery for a FischerTropsch catalyst. Choudhary et al. developed a graph neural network ML potential and combined it with a genetic algorithm for crystal structure prediction of alloys [27].
2 Method
2.1 Summary of main category of CSP algorithms
Algorithm  Year  Category  Opensource  URL link  Program Lang 

USPEX [4]  2006  De novo (DFT)  No  link  Matlab 
CALYPSO [19]  2010  De novo (DFT)  No  link  Python 
ParetoCSP [28]  2024  MOGA+MLP  Yes  link  Python 
GNOA [9]  2022  BO/PSO + MLP  Yes  link  Python 
TCSP [17]  2022  Template  Yes  link  Python 
CSPML [29]  2022  Template  Yes  link  Python 
GATor [30]  2018  GA + FHI potential  Yes  link  Python 
AIRSS [31, 32]  2011  Random + DFT or pair Potential  Yes  link  Fortran 
GOFEE [33]  2020  ActiveLearning + Gaussian Pot.  Yes  link  Python 
AGOX [13]  2022  Search + Gaussian Potential  Yes  link  Python 
GASP [34]  2007  GA + DFT  Yes  link  Java 
M3GNet [35]  2022  Relax with MLP  Yes  link  Python 
ASLA [36]  2020  NN + RL  No  link  N/A 
CrySPY [37]  2023  GA/BO + DFT  Yes  link  Python 
XtalOpt [38]  2011  GA + DFT  Yes  link  C++ 
AlphaCrystal [39, 40]  2023  GA + DL  Yes  link  Python 
2.1.1 ab initio CSP
There are several opensource CSP codes based on combining search algorithms with DFT energy calculation, including CrySPY [37], XtalOpt [38], GASP [34] , AIRSS [31, 32]. However, the most widely used and wellestablished leading software for de novo CSPare GA based USPEX and particle swarm optimization (PSO) based CALYPSO [19]. Due to the computational costs associated with DFTbased CALYPSO, we selected a subset of 23 structures from the test dataset for prediction.Despite their closedsource code, their binary programs can be easily obtained and both come with several advanced search techniques such as symmetry handling, crowding niche, and so on. In this algorithm, structures within the first population are randomly generated while adhering to proper physical constraints, such as the interatomic distances and crystal symmetry. Similar crystal structures are then removed using structure characterization techniques, including bond characterization metrics and coordination characterization function to streamline the search. Once all structures of each population are established, local optimizations are performed using DFTbased methods to locate the local minima. Structural evolution is further carried out using swarm intelligence algorithms such as particle swarm optimization or artificial bee colony. New structures are generated based on the information gathered from the previous generation. By combining random structure generation, local optimizations, and swarm intelligence algorithms, the CALYPSO method efficiently explores the PES, increasing the chances of locating the global energy minimum. In summary, the general idea is to iteratively generate and optimize structures to navigate the complex energy landscape.
Due to the demanding computational costs for DFT calculations, we allocated 3,000 DFT energy calculations in all their experimental runs for different benchmark test samples.Here, the structural relaxations are performed using the Vienna Ab initio simulations package (VASP) [41] by considering the PerdewBurkeErnzerhof generalized gradient approximation [42] for the exchangecorrelation function and the projectoraugmentedwave potentials [43] for the electronion interactions.VASP allows for geometry optimization using different optimization algorithms, such as the conjugate gradient method (CG) and quasiNewton RMMDIIS algorithm.The VASP running parameters mainly involve the planewave cutoff energy (the maximum kinetic energy for the electronic wavefunctions), Monkhorst–Pack k meshes(sampling in the Brillouin zone), energy and force convergence precisions. By gradually optimizing the structure by adjusting these corresponding parameters, the optimization process can be accelerated while still obtaining reliable and accurate results. This approach can help save overall time in structure prediction by efficiently exploring the configuration space and converging toward the optimal structure.The relevant parameter settings for DFT calculations of CALYPSO during the optimization process for its predictions are listed in detail in Table S7 of the supplementary file.
2.1.2 GNOA with MLpotentials
Our approach involves GNOA algorithms [9], a machinelearning method for crystal structure prediction. In this framework, a graph network (GN) is employed to establish a correlation model between crystal structures, and an optimization algorithm (OA) is utilized to accelerate the search for the crystal structure with the lowest formation enthalpy. In this work, we evaluate the CSP algorithms based on ML potentials. Two graph neural network potentials have been tested here including the MEGNet [44] and M3GNet [35], which has been combined with random search (RAS), Bayesian optimization (BO), and Particle Swarm Optimization (PSO) for crystal structure prediction as implemented in their GNOA package [45].
2.1.3 AGOX with M3GNet potential
We adopt Atomistic Global Optimization X (AGOX)[13], a customizable and efficient global structural optimization framework that has six search global optimization algorithms implemented. AGOX uses the effective medium theory (EMT) potential[46] to optimize and relax the generated candidate structures by default. For better comparison with other algorithms, we replace the simple EMT potential with the more powerful M3GNet[35] interatomic potential. M3GNet is based on graph neural networks and explicitly incorporates manybody interactions and is much faster than DFTbased energy calculations[47, 48]. After completing the optimized structure search using each algorithm, the final optimized structure is further relaxed using M3GNet. An overview of the AGOX framework is shown in Figure1. In our work, we use three different search algorithms: Basinhopping (BH), parallel tempering (PT), and random search (RSS). The global search algorithms are described below:
Random structure search:
Random structure search (RSS) is the simplest algorithm in AGOX. In each iteration, it generates a candidate at random and optimizes it locally. The relaxed candidate is then stored in the database.
Basinhopping:
Basinhopping (BH)[14] is a method of exploring the configuration space by performing a series of jumps from one potential energy surface (PES) minimum to another, turning the potential energy surface into a network of interpenetrating stairs. A sampler is employed to maintain track of a candidate that has already been reviewed and provides information for the creation of a new candidate. The process begins by rattling a prior candidate to produce a new candidate. Next, the generated candidate is relaxed locally. The Metropolis criterion is next examined to decide whether the new candidate is approved as the generation’s beginning point. The probability of acceptance of the Metropolis criterion is defined by the following equation:
$A=min\{1,exp[\beta(E_{k1}E_{k})]\}$  (1) 
where $\beta=1/k_{B}T$ with $k_{B}$ the Boltzmann’s constant, and $E_{k}$ is the energy of the structure found in iteration $k$.
Parallel tempering
Simultaneous basinhopping searches are conducted across different temperatures in the parallel tempering (PT) method, as described by Kofke et al.[15]. This approach promotes exploration at elevated temperatures and exploitation at lower ones, ensuring that structures adaptable to varying temperatures are swapped to prevent stagnation. In this setup, multiple workers with different processors each perform a basinhopping search at a specific temperature, utilizing a single database.The following equation calculates the probability of the structure swap between workers with adjacent temperatures every $N_{t}$ episode:
$A=min\{1,exp[(\beta_{i}\beta_{j})(E_{i}E_{j})]\}$  (2) 
where $\beta=1/k_{B}T_{i}$ with $k_{B}$ the Boltzmann’s constant.
Local GPR basinhopping
The basinhopping search is enhanced by the use of a local Gaussian process regression (GPR) model[49] implemented inside the AGOX framework. The Local GPR model uses a radial basis function (RBF) kernel[50] and the smooth overlap of atomic positions (SOAP)[51] descriptor to perform the basinhopping search.
Evolutionary algorithm
Biological evolution theories serve as the foundation for evolutionary algorithms (EAs). The first step is to generate a population of potential solutions, and then each one is evaluated using a fitness function to determine how effective it is. With time, the population evolvesand finds better solutions. With each iteration of EA, a population of candidates is maintained and used as input to generate a new candidate. After that, the newly generated candidate is relaxed. The sampler would keep a population of structurally different candidates in an EA so that they may serve as the parents of future candidates. The algorithm presented in [52] is used to select the population.
GOFEE: Bayesian Optimization
GOFEE is a Bayesian search algorithm that Bisbo and Hammer[16] developed as an effective technique for locating lowenergy structures in computationally expensive energy landscapes, termed global optimization with firstprinciples energy expressions. A combination of an evolutionary search strategy and an actively learned surrogate model of the energy space is deployed in GOFEE. This facilitates answering a lot more structural queries than the target potential would allow. However, a significantly smaller number of evaluations are performed utilizing the target potential on the structures that the surrogate model thought were most promising. These evaluations serve as training data to further refine the surrogate model. In GOFEE, a set of candidate structures is first locally optimized using a computationally inexpensive surrogate potential. Subsequently, a lower confidence bound acquisition function selects candidates for evaluation with the true potential. Each episode of GOFEE generates $N$ candidates, which are locally optimized using a GPR potential, or more precisely in the socalled lowerconfidencebound expression, defined by the following equation:
$E(x)=\hat{E}(x)k\sigma(x)$  (3) 
where $\hat{E}$ and $\sigma$ are the predicted energy and uncertainty of the GPR model for the structure represented by $x$.
2.1.4 ParetoCSP
ParetoCSP[28] based on the idea of the GNOA algorithm[9] with two major upgrades including the multiobjective GA search algorithm and the use of M3GNet potential for energy calculation. GNOA has been proven from previous research that incorporating symmetry constraint expedites CSP. Similar to the GNOA approach, our method also considers crystal structure prediction with symmetry constraints. We incorporate two additional structural features, namely crystal symmetry $S$ and the occupancy of Wyckoff position $W_{i}$ for each atom $i$. These features are selected from a collection of $229$ space groups and associated $1506$ Wyckoff positions. The method begins by selecting a symmetry $S$ from the range of $P2$ to $P230$, followed by generating lattice parameters $L$ within the chosen symmetry. Next, a combination of Wyckoff positions $\{W_{i}\}$ is selected to fulfill the specified number of atoms in the cell. The atomic coordinates $\{R_{i}\}$ are then determined based on the chosen Wyckoff positions $\{W_{i}\}$ and lattice parameters $L$. To generate crystal structures, we need to tune the $S$, $\{W_{i}\}$, $L$, and $\{R_{i}\}$ variables.
By selecting different combinations of $S$, ${W_{i}}$, $L$, and ${R_{i}}$, one can generate a comprehensive array of possible crystal structures for the given ${c_{i}}$. In theory, determining the energy of these various structures and selecting the one with the least energy should be the optimal crystal arrangement. However, exhaustively enumerating all these structures becomes practically infeasible due to the staggering number of potential combinations. To address this complexity, a more practical approach involves iteratively sampling candidate structures from the design space, under the assumption that one of the sampled structures will emerge as the most stable and optimal solution. Consequently, an optimization strategy is adopted to guide this search process towards identifying the structure with the lowest energy. In particular, a genetic algorithm, NSGAIII[53], improved by incorporating AFPO[54] to enhance its performance and robustness, is utilized.
It starts by generating n random crystals and assigning them an age of 1, where n denotes the population size. One completegeneration then goes through the following steps: calculating the energy of the structures and fitness, selecting parents, performinggenetic operations, and updating the age. After a certain threshold of G generations, the lowest energy structure from the multidimensional Pareto front is chosen and further relaxed and symmetrized to obtain the final optimal structure. The genetic encoding isshown in the lower right corner of the flowchart. It contains lattice parameters $a$, $b$, $c$, $\alpha$, $\beta$, and $\gamma$, the space group $S$, the wyckoffposition combination $W_{i}$, and the atomic coordinates $R_{i}$ of atom indexed by $i$.
2.1.5 templatebased CSP
CSPML [29] is a machine learningbased crystal structure prediction algorithm that uses metric learning [55] to automate the selection of template structures from a stable structure database with high chemical replaceability to the probable structure for a given chemical composition. For a given formula, CSPML first restricts the candidates to structures with the same compositional ratio and then uses XenonPy [56] to calculate the compositional descriptor of the query formula and templates; only templates ranked as the top five can be considered candidate structures. For the 38 query compositions selected from the Materials Project database [57], 35 out of them have candidates with probabilities greater than 0.5, and 18 out of them have ranked the best template structure that is most similar to the true structure in the top five.
TCSP [17] is a templatebased crystal structure prediction algorithm. For a given formula, TCSP first narrows down the candidates to structures with the same prototype and then uses Element’s mover distance (ElMD) [58] to measure the compositional similarity between the query formula and the compositions of all possible template structures. We implement BERTOS [59] in TCSP, which achieves over 96.82% accuracy for allelement oxidation states prediction on the Inorganic Crystal Structure Database (ICSD), to leverage its significant capabilities to enhance the accuracy of predicting oxidation states in the searching template element process of TCSP. Templates with identical oxidation states are then added to the final template list. If no such templates are found, the top five candidate structures are taken as the final templates. We apply the M3GNet potential to optimize generated structures in this work.
2.1.6 DLbased CSP
AlphaCrystalII [40] is a deep learning based crystal structure prediction algorithm based on the prediction of atomic pairwise distances and distance matrix based coordinate reconstruction [60]. This datadriven CSP algorithm exploits the implicit chemical and geometric rules embedded in existing crystal structures as deposited in material databases such as Materials Project or ICSD: for example, most cations are surrounded by anions. A deep neural network is trained to predict the distance matrix given only the composition, which is then used as the objective target for a gradient free optimization (Nevergrad[61])based crystal structure reconstruction algorithm to search for the atomic coordinates of the structures. The resulting candidate structures are then fed to the M3GNetbased fast structure relaxer to finetune the structures.
2.2 CSPBenchmark test set design
To construct a balanced and effective benchmark dataset for crystal structure prediction, we meticulously considered several key factors contributing to the complexity of this challenge. These factors include the total number of atoms and distinct elements within the compositions, the degree of symmetry as indicated by space groups, the prototype characterized by specific atomic ratios, and the shape and dimensions of the unit cell. Additionally, we accounted for the prevalence of similar compositions within established crystal structure databases to ensure comprehensive representation. We selected a total of 180 crystal structures, named CSP180, from the Materials Project database [57]. These structures are evenly distributed among binary, ternary, and quaternary compounds, ensuring a diverse and representative sample. The selected structures exhibit a wide variety of space groups, with the most prevalent being space group 225, which appears 27 times. Other common space groups include 139, 216, 221, and 194. Regarding the crystal system distribution, most structures belong to the cubic system, followed by the tetragonal and hexagonal systems. There are fewer occurrences of orthorhombic, trigonal, monoclinic systems, and a single instance of the triclinic system. Our selection process aimed to include structures with varying levels of prediction difficulty. TableLABEL:table:dataset presents detailed information on the 36 selected test crystals of binary structures, categorized into three difficulty levels: binary_easy, binary_medium, and binary_hard, with each category containing 12 structures. The criteria for difficulty classification include factors such as space group classification, templatebased categorization, and the prototype ratios defining the crystal structures. The 180 crystal structures were chosen to cover a broad range of complexities and to provide a comprehensive benchmark for testing. For example, binary compounds like DyCu and GaCo, which belong to space group 221 and exhibit cubic symmetry, were categorized as binary_easy due to their simpler and more predictable structures. In contrast, more complex structures, such as those with trigonal symmetry or multiple elements with varying oxidation states, were placed in higher difficulty categories. This careful selection ensures that the dataset not only includes easily predictable structures but also those that present significant challenges, thereby testing the robustness and accuracy of prediction algorithms.Additional test structures and their corresponding data are provided in the supplementary file for further reference.
Material id  Pretty formula  Space group  Crystal system  Category 
mp2334  DyCu  221  Cubic  binary_easy 
mp2226  DyPd  221  Cubic  binary_easy 
mp1121  GaCo  221  Cubic  binary_easy 
mp2735  PaO  225  Cubic  binary_easy 
mp1169  ScCu  221  Cubic  binary_easy 
mp30746  YIr  221  Cubic  binary_easy 
mp24658  SmH_{2}  225  Cubic  binary_easy 
mp20225  CePb_{3}  221  Cubic  binary_easy 
mp788  Co_{2}Te_{2}  194  Hexagonal  binary_easy 
mp20176  DyPb_{3}  221  Cubic  binary_easy 
mp1231  Cr6Ga_{2}  223  Cubic  binary_easy 
mp12570  ThB_{12}  225  Cubic  binary_easy 
mp20132  InHg  166  Trigonal  binary_medium 
mp2209  CeGa_{2}  191  Hexagonal  binary_medium 
mp30497  TbCd_{2}  191  Hexagonal  binary_medium 
mp30725  YHg_{2}  191  Hexagonal  binary_medium 
mp2731  TiGa_{3}  139  Tetragonal  binary_medium 
mp2510  ZrHg  123  Tetragonal  binary_medium 
mp2740  ErCo_{5}  191  Hexagonal  binary_medium 
mp570875  Ga_{4}Os_{2}  70  Orthorhombic  binary_medium 
mp861  Hf4Ni_{2}  140  Tetragonal  binary_medium 
mp1566  SmFe_{5}  191  Hexagonal  binary_medium 
mp2387  Th_{4}Zn_{2}  140  Tetragonal  binary_medium 
mp1607  YbCu_{5}  191  Hexagonal  binary_medium 
mp13452  BePd_{2}  139  Tetragonal  binary_hard 
mp11359  Ga_{2}Cu  123  Tetragonal  binary_hard 
mp1995  PrC_{2}  139  Tetragonal  binary_hard 
mp30501  Ti_{2}Cd  139  Tetragonal  binary_hard 
mp30789  U_{2}Mo  139  Tetragonal  binary_hard 
mp454  NaGa_{4}  139  Tetragonal  binary_hard 
mp1827  SrGa_{4}  139  Tetragonal  binary_hard 
mp2129  Nd_{2}Ge_{4}  141  Tetragonal  binary_hard 
mp30682  ZrGa  141  Tetragonal  binary_hard 
mp2128  Sn_{8}Pd_{2}  68  Orthorhombic  binary_hard 
mp1208467  Tb_{8}Al_{2}  227  Cubic  binary_hard 
mp640079  Mn_{9}Au_{3}  123  Tetragonal  binary_hard 
2.3 Evaluation procedure and running parameters for different algorithms
We substituted DFT calculations with the M3GNet potential, a graph neural networkbased surrogate potential model [35], to compute the energy distance for relaxing both the ground truth structure and the predicted structure. Subsequently, we utilized this model to determine the energy distance between these structures. The running parameters and configuration for all CSP algorithms are shown in Table S6 in the supplementary file.
2.4 Evaluation metrics
Evaluation metrics are essential in materials science research as they quantitatively assess the performance and effectiveness of different materials. Currently, numerous evaluation metrics exist in molecular research, such as RDKit [62] and MOSES [63]. However, in the field of materials informatics, there is no unified standard for evaluating new structures. Recently, we introduced a set of distance metrics for CSP performance comparisons in benchmark studies [64], including M3GNet energy distance, minimal rmse distance, minimal mae distance, rms distance, rms anonymous distance, Sinkhorn distance, Chamfer distance, Hausdorff distance, superpose rmsd distance, edit graph distance, Fingerprint distance, to standardize the training and comparison of material structure generation models. For test structures in the polymorph category, we employ a detailed evaluation approach. We compare the predicted structures with multiple ground truth structures, each representing different polymorphs. As each sample corresponds to multiple ground truth polymorphs, this results in several evaluation metrics for each sample. To identify the most accurate predictions, we select the evaluation metrics associated with the ground truth structure that has the minimum M3GNet energy distance. This method ensures that the selected metrics reflect the closest match to the predicted structure, providing a reliable measure of prediction accuracy. The distance metrics are shown below. Table 3 shows selected distance scores for various test samples generated by the AGOXpt algorithm.
 •
Wyckoff position fraction coordinate RMSE distance
 •
Wyckoff Minimal MAE distance
 •
M3GNet Energy distance
 •
Pymatgen RMS distance
 •
Sinkhorn distance
 •
Chamfer distance
 •
Hausdorff distance
 •
Superpose RMS distance
 •
CrystalNN Fingerprint distance
 •
Edit Graph distance
 •
XRD distance
 •
OFM distance
In addition to the above quantitative distance metrics, we also used Pymatgen’s StructureMatcher to calculate the success rate of crystal structure prediction by identifying if similar structures exist in the MP database with the following default parameters: ltol=0.2, stol=0.3, angle_tol=5, as used in [65].It is important to note that, unlike previous research, we found that StructureMatcher can incorrectly declare structure identity when two similar structures have different space groups (see Discussion section). Therefore, we interpreted the success rate along with the space group matching rate in our results. These additional evaluations provide a more comprehensive assessment of the generated structures’ performance across different algorithms.
Ranking scores of algorithms:
To evaluate how different performance metrics reflect the actual closeness of the predicted structures to the ground truth structure, we employed quantitative distance matrices of CSP [64] to assess the quality of all structures generated by the algorithms. We adopted a ranking scheme to evaluate candidate CSP algorithms based on the quality of their predicted structure against the ground truth structure. For each test structure, all algorithms are first ranked based on the quality of their predicted structures, i.e., their distances to the ground truth structure. Ranking scores on a 0100 scale are assigned to the algorithms using a standardized scoring method to ensure fairness in ranking. The ranking scheme is illustrated as follows: for example, if there are five algorithms for comparison, five evenly distributed scores ranging from 100 to 0 are assigned to the five algorithms sorted by their performance from the highest to the lowest. Specifically, the algorithm in the first place receives a score of 100 (reflecting the smallest distance), and the secondplaced algorithm earns a score of 75, followed by 50 for the third place, 25 for the fourth place, and 0 for the fifth place. In cases where multiple algorithms produced structures with identical quality/distances, they were assigned the same rank, and scores were averaged according to their rankings. For instance, if the first and second place algorithms tie in the quality of their predicted structures, their scores are set as the average of 100 and 75 [(100 + 75)/2]. Similarly, if all five algorithms have the same performance, their scores are set as the average of the five scores [(100 + 75 + 50 + 25 + 0)/5]. Figure 5shows the ranking scores based on the overall average distances for each algorithm.







 

YHg_{2}  1.02  0.27  38.60  23.01  13.20  1.87  1.44  
ScCu  2.69  0.43  22.04  19.80  10.94  2.35  2.85  
K_{4}Na_{2}Ga_{2}P_{4}  0.22  0.31  92.57  11.88  1.98  1.75  1.13  
Re_{2}O_{6}  0.21  0.31  71.15  13.76  1.00  2.89  1.73  
CrFeCoSi  2.25  0.32  59.11  23.89  14.55  N/A  2.92  
Ba_{2}YRuO_{6}  1.07  0.31  154.98  23.12  16.07  2.54  1.95  
Li_{2}NiO_{2}  0.91  0.31  51.95  18.29  10.82  N/A  1.42  
PrC_{2}  1.76  0.35  23.67  14.92  8.06  1.42  1.43  
Ge_{12}Rh_{3}  0.03  0.35  207.02  24.41  15.53  1.84  1.61  
MgV_{4}SnO_{12}  0.39  0.30  181.78  15.46  1.28  1.93  1.35  
DyPd  2.46  0.35  16.97  15.52  8.65  2.33  2.37  
CeCr_{2}Si_{2}C  1.29  0.32  48.70  12.08  1.44  1.69  1.39  
Nb_{2}P_{2}Se_{2}  0.64  0.26  38.29  9.25  1.49  1.80  1.34  
BePd_{2}  2.23  0.29  24.23  15.05  8.42  2.47  1.61  
SrGa_{4}  0.56  0.27  44.16  16.13  9.16  N/A  1.38  
KLi_{6}IrO_{6}  0.55  0.30  162.23  15.92  1.40  2.64  1.23  
Hf_{4}Mn_{8}  1.66  0.33  185.07  23.21  15.24  2.32  1.58  
Fe_{2}Cu_{6}SnS_{8}  0.28  0.34  81.88  7.01  1.82  2.20  2.06  
KAs_{4}IO_{6}  0.27  0.35  145.19  20.16  13.39  2.17  1.33  
Ti_{2}Cd  2.50  0.39  26.25  14.15  8.65  2.36  1.71 
3 Results
3.1 Performance comparison of CSP algorithms over all test structures
We evaluated the performance of 13 CSP algorithms in predicting the structures of 180 test samples: TCSP, CSPML, ParetoCSP, AlphaCrystalII, GNOAM3GNetRAS, GNOAM3GNetPSO, GNOAM3GNetBO, GNOAMEGNetRAS, GNOAMEGNetPSO, GNOAMEGNetBO, AGOXrss, AGOXpt, AGOXbh.The success rates varied across the different algorithms. TCSP achieved the highest success rate, efficiently predicting all 180 structures, while each of the AGOX algorithms (AGOXrss, AGOXpt, and AGOXbh) successfully predicted structures for 175 out of the 180 samples. ParetoCSP and CSPML exhibited impressive performance, predicting 173 and 158 structures, respectively. However, AlphaCrystalII was limited to predicting only 121 structures due to its inability to handle structures with more than 12 atoms. The GNOA algorithms showed weaknesses in predicting complex binary, ternary, and quaternary structures, with successful predictions ranging from 30 to 39 out of 180 samples.To analyze and compare the performance of all CSP algorithms across the 180 test structures, we first calculated the StructureMatcher success rate by utilizing StructureMatcher from Pymatgen, with the following default parameters: ltol=0.2, stol=0.3, angle_tol=5 to find out if similar materials already existed in Materials Project database. As shown in Figure 2, we find that two templatebased CSP algorithms, TCSP and CSPML, stand out for their high performance in generating structures with similar space groups to the ground truth structures. CSPML and TCSP achieve the best performance with success rates of 46.111% and 42.778%, respectively. In contrast, AlphaCrystalII and ParetoCSP showed significantly lower performance, with StructureMatcher success rates of 13.333% and 11.111%, respectively. The GNOA algorithms generally underperformed, with success rates ranging from 0.556% to 4.444%. Among the three M3GNetbased GNOA CSP algorithms, it can be found that the GNOAM3GNetPSO (particle swarm optimization) is better than BO (bayesian optimization) and RAS (random search) based algorithms, reflecting the importance of the search capability used in the CSP algorithms. We can also find that three MEGNet based GNOA algorithms all perform poorly here due to their MEGNet energy potential with lower accuracy. We also evaluated the performance of three AGOX algorithms using different optimization strategies: Basin Hopping (BH), parallel tempering (PT), and random search (RSS). The AGOX algorithms failed to predict any structures matching those in the MP database.Symmetry prediction performance also plays an important role. We computed the space group match rate for each algorithm, which indicates whether the predicted structure has the same space group number as the ground truth structure. TCSP achieves the best performance with 57.778%, followed by CSPML with a space group match rate of 45.556%. The proficiency of templatebased CSP algorithms in predicting crystal structures with identical symmetries may stem from two factors. The first is their adeptness at recognizing highly similar structure templates by using oxidation state and compositionbased fingerprint matching. The second reason for their success is the widespread existence of similar crystal structures with identical space groups and crystal systems, making it easier to find a template and use simple elemental substitution to determine their structures. This structural distribution pattern has been exploited by the DeepMind team to help discover more than 380,000 new hypothetical stable materials in their Nature report [66] in 2023. However, it is also observed that both templatebased algorithms, CSPML and TCSP, fail to find structures with correct symmetry for at least 98 materials (54%) and 76 materials (42%) respectively, reflecting the dire demand for developing de novo CSP prediction algorithms.Next, we found that the space group match rate and StructureMatcher success rate are consistent over the remaining algorithms. Out of the 11 de novo CSP algorithms, ParetoCSP and AlphaCrystalII outperform all other algorithms, the space group match rates are 12.222% and 12.778% for ParetoCSP and AlphaCrystalII, respectively. These are 214.27% and 228.57% better than the GNOAM3GNetPSO algorithm, the best of the remaining 9 de novo algorithms. These successes can be attributed to ParetoCSP’s strong global search capability based on the agefitness multiobjective genetic algorithm along with its usage of the M3GNet deep learning potential model [35], and contactmap based deep learning CSP algorithm AlphaCrystalII utilizes interatomic interaction patterns found in existing known crystal structures.GNOA algorithms exhibited low space group match rates, ranging from 0.556% to 12.778%. All AGOX algorithms had a space group match rate of 0.556%, indicating their inability to find structures with the same space group number as the ground truth structures.Overall, we find that current de novo algorithms based on machine learning potentials can only achieve moderate crystal structure performance in terms of their success rate and space group prediction accuracy, indicating the significant potential for further development in this research area.More details on how many of the predicted structures by each algorithm have the same symmetry with the ground truth structures in terms of their space group and crystal system are shown in Table S1 of the supplementary file.
To conduct a comprehensive analysis and comparison of each algorithm’s performance, we further evaluated the algorithms using a set of quantitative metrics proposed in our work [64]. First, we used the formation energy distances of the predicted structures compared to the ground truth as the performance metric for algorithm comparison, a method widely used in previous CSP work [4, 67, 22]. Using formation energy as a performance metric has a unique value as it can serve as a critical indicator of a crystal’s stability in nature. To efficiently analyze and validate the performance of structures generated by various algorithms, we computed the ranking score for each algorithm based on the M3GNet formation energy distance of each structure.
As shown in Figure 3, TCSP achieves the best performance in terms of the average M3GNet energy distance, with a score of 81.99. Close behind, ParetoCSP and CSPML achieve ranking scores of 79.87 and 78.91, respectively. The high ranking scores for TCSP and CSPML based on M3GNet energy distance reflect these two templatebased CSP algorithms’ consistency in symmetry prediction performance. ParetoCSP’s ranking score of 79.87 is slightly below TCSP yet considerably higher than the scores of all other 11 de novo CSP algorithms. The three AGOX algorithms achieve ranking scores of 62.22, 64.15 and 63.59, respectively. AlphaCrystalII achieves a ranking score of 58.46, slightly lower than those of the AGOX algorithms. Despite the AGOX algorithms having the lowest StructureMatcher success rate and space group match rate, their higher M3GNet energy distance ranking scores compared to AlphaCrystalII can be attributed to the larger number of successfully generated structures by the AGOX algorithms. Furthermore, AlphaCrystalII demonstrates its capability by generating a larger total number of predicted structures compared to all GNOA algorithms, it surpasses the performance of GNOAMEGNetPSO by an astounding 830.89% and GNOAM3GNetPSO by at least 297.69%. The significantly lower ranking scores for all GNOA algorithms are primarily due to their limited ability to generate structures and their weakness in predicting structures similar to the ground truth, as reflected by the small portion of successfully generated structures.
We further utilized the average Chamfer distance as a metric, which is calculated as the mean of the squared distances between each atomic site in one structure and its nearest neighbor in another, and serves as a robust measure of structural congruence. By capturing the spatial correlation between atomic sites, the Chamfer distance offers a comprehensive and nuanced measure of similarity. This sensitivity to the intricate details of crystallographic arrangements allows for a nuanced evaluation of model performance across a diverse range of structures. The ranking scores based on the average Chamfer distances are depicted in Figure 4.Among the algorithms evaluated, TCSP emerges as the top performer with the highest ranking score of 89.04, closely followed by CSPML with a score of 77.05. These templatebased CSP algorithms excel in generating structures that are more congruent with the ground truth structures, demonstrating their strong predictive capabilities. ParetoCSP, with a ranking score of 77.99, also exhibits good performance by the agefitness Pareto genetic algorithm. The AGOX family of algorithms, including AGOXbh, AGOXpt, and AGOXrss, show relatively consistent performance, with scores ranging from 56.58 to 58.18. AlphaCrystalII achieves a comparable ranking score of 53.89.The performances of ParetoCSP, the AGOX family, and AlphaCrystalII significantly outperform the GNOA family of algorithms. AGOXrss outperforms GNOAM3GNetBO by a substantial 293.90%. While GNOA algorithms demonstrate superior performance for simpler binary structures, their predictive accuracy diminishes for more complex structures. This is evident in their lower average ranking scores on the Chamfer distance metric, ranging from 10.90 to 15.45, highlighting the challenge of accurately predicting complex crystal structures.Common limitations of current methods include their dependency on templatebased approaches and difficulties in predicting structures with complex symmetries and compositions. These constraints necessitate the development of more advanced, de novo CSP prediction algorithms.Overall, the ranking scores based on the average Chamfer distances provide a meaningful evaluation of the algorithms’ ability to predict crystal structures that closely match the ground truth structures, with templatebased approaches generally outperforming other methods for the given set of structures.Finally, in our endeavor to thoroughly and accurately assess the performance of each algorithm, we conducted a comprehensive analysis by computing overall ranking scores based on all 12 distance metrics. As depicted in Figure 5, CSPML maintains its dominance with a score of 71.27, followed closely by the TCSP algorithm at 66.83. CSPML outperforms the TCSP algorithm due to its utilization of chemical composition descriptors and crystal structure descriptors. This comprehensive approach results in a higher ranking score for CSPML.The AGOX family achieved scores of 55.25, 54.84, and 54.59, respectively, while ParetoCSP had a ranked score of 57.40. In contrast, AlphaCrystalII received a slightly lower score of 48.34, and the GNOA family ranged from 8.27 to 12.41. These results underscore the varied capabilities of the algorithms, not only affirming the importance of integrating diverse descriptors for enhancing predictive accuracy but also highlighting the challenges and potential areas for improvement in crystal structure prediction algorithms.
3.2 Performance comparison over binary structures
To further analyze the performance of various algorithms, we focus on their predictions across 60 binary structures, employing the M3GNet energy distance and Hausdorff distance as key metrics. Hausdorff distance is a structure similarity metric that represents the maximum deviation between two structures which has the advantage of being invariant to rigid transformations, such as translations, rotations, and reflections. The average ranking scores derived from both metrics for these binary test structures are detailed in Figure 6, showcasing the average M3GNet energy distances and Hausdorff distances in comparison to the ground truth structures for all predictions. Among the algorithms, ParetoCSP achieved the highest ranking scores, with 83.66 for the M3GNet energy distance metric and 81.41 for the Hausdorff distance. TCSP also performed well, with a ranking score of 81.03 for the M3GNet energy distance and 82.69 for the Hausdorff distance, indicating strong performance in both metrics. CSPML, while slightly lower than TCSP and ParetoCSP, still showed robust performance with ranking scores of 74.81 and 72.18 for the M3GNet energy distance and Hausdorff distance, respectively. The AlphaCrystalII and AGOX algorithms (AGOXbh, AGOXpt, and AGOXrss) demonstrated relatively good performance, with ranking scores ranging from 54.94 to 60.00 for the M3GNet energy distance and 51.67 to 55.51 for the Hausdorff distance. Additionally, the GNOA algorithms with the M3GNet potential (GNOAM3GNetRAS, GNOAM3GNetPSO, and GNOAM3GNetBO) generally achieved higher ranking scores based on both the M3GNet energy distance and Hausdorff distance metrics compared to the GNOA algorithms with the MEGNet potential. However, the GNOA algorithms still struggled with predicting more complex structures, even within the binary structures category, especially those that do not consist of a simple 1:1 ratio of atoms. Their ranking scores for the Hausdorff distance ranged from 16.41 to 25.28, highlighting the need for further improvements to better handle more intricate crystal structures. Overall, despite the relatively simpler nature of binary structures, some algorithms still faced challenges in accurately predicting their configurations. This underscores the importance of continuous improvement and the development of more robust predictive models.
3.3 Performance comparison over ternary structures
Given the relative simplicity of binary structures in our dataset, we extend our analysis to ternary and quaternary crystal structures to evaluate the strengths and weaknesses of various CSP algorithms across different structural complexities.We compared the algorithm performances for ternary structures using the M3GNet energy distance and the Hausdorff distance based ranking scores.As shown in Figure 7, CSPML and ParetoCSP achieve the highest ranking scores in terms of M3GNet energy distance of 87.76 and 82.76, respectively. Meanwhile, TCSP attains the highest ranking scores for the Hausdorff distance, recording 89.17, with a closely comparable score of 80.45 on the M3GNet energy distance. Similar to their performance on binary structures, the AGOX algorithms exhibited lower scores, ranging from 60.77 to 61.60 on the M3GNet energy distance and from 57.44 to 58.27 on the Hausdorff distance. AlphaCrystalII achieved ranking scores of 62.05 on the M3GNet energy distance and 56.41 on the Hausdorff distance. However, the GNOA algorithms faced significant limitations in predicting more complex structures, resulting in low ranking scores on both distance metrics, ranging from 3.59 to 14.42. Comparing the ranking scores of different algorithms in Figure 7 to those for binary test structures (Figure 6), we find that the templatebased algorithms and ParetoCSP maintain similar ranking scores. In contrast, the scores for the AGOX algorithms and AlphaCrystalII increase, indicating improved performance with more complex structures. However, these improvements come at the cost of the ranking scores of the GNOA family of algorithms. This performance gap clearly indicates the necessity for further development and refinement within the GNOA algorithms to enhance their predictive accuracy and reliability in handling complex crystal structures.
3.4 Performance comparison over quarternary structures
We evaluated the ranking scores for quaternary structure predictions across all CSP algorithms using M3GNet energy distance and Hausdorff distances as metrics. Figure 8 illustrates the ranking scores for each algorithm. shows the ranking scores for each algorithm. Using the Hausdorff distance as a metric, the TCSP algorithm achieved the highest ranking score of 91.73, followed by CSPML and ParetoCSP with the scores of 71.86 and 67.44. Scores for the AGOX family are competitive, with 66.03, 65.51, and 65.38, respectively. On the other hand, ParetoCSP presents a challenge in addressing quaternary structures, reflected by its slightly lower score compared to the ranking scores for binary (Figure 6) and ternary structures (Figure 7). The GNOA algorithms, which encompass six distinct approaches, record scores ranging from 4.10 to 12.44, indicating significant difficulties in predicting quaternary compounds. When evaluated based on the M3GNet energy distance, TCSP again leads with the highest ranking score of 84.49. CSPML, ParetoCSP, and AGOX algorithms also demonstrate competitive performance with scores from 67.24 to 74.17. Among the algorithms assessed, seven others outperformed the GNOA algorithms, reflecting the inherent challenges in predicting quaternary compounds.This comprehensive analysis highlights the strengths and limitations of different CSP algorithms across varying structural complexities, emphasizing the need for continuous improvement and refinement to handle more complex crystal structures effectively.To provide a comprehensive analysis, additional performance comparisons utilizing the Sinkhorn distance, superpose RMSD, Wyckoff RMSE, XRD distance, and OFM distance across binary, ternary, and quaternary test structures, as detailed in the Figure S2, S3 and S4 in the supplementary file.
Algorithm  CALYPSO  CSPML  ParetoCSP  AGOXpt  
primitive formula  mpid  ED  HD  ED  HD  ED  HD  ED  HD  
Ca_{3}SnO  mp29241  0.002  2.413  0.001  0.021  0.001  0.023  1.099  9.715  
Co_{2}Ni_{2}Sn_{2}  mp20237  0.061  5.489  0.000  2.557  0.002  0.056  1.210  15.062  
Co_{2}Te_{2}  mp788  0.028  6.520  0.220  2.475  0.050  4.573  0.879  20.100  
Cr_{6}Ga_{2}  mp1231  2.016  7.001  0.096  5.710  0.015  1.622  1.494  6.864  
Hf_{4}Mn_{8}  mp11449  0.002  6.383  0.129  8.715  0.266  6.457  1.660  15.644  
Hf_{4}Ni_{2}  mp861  0.014  4.064  1.274  11.162  0.039  7.752  1.823  11.395  
HfCo_{2}Sn  mp20730  0.054  3.928  0.002  0.046  0.038  9.083  2.175  16.670  
InHg  mp20132  0.012  10.296  0.015  7.968  0.069  7.379  0.191  12.479  
Li_{2}CuSn  mp30591  0.004  3.933  0.111  0.129  0.012  8.105  0.818  16.551  
LiMg_{2}Ga  mp30648  0.031  7.062  0.000  2.892  0.032  9.908  0.773  19.834  
MgCu_{4}Sn  mp3676  0.006  3.194  0.167  5.256  0.085  3.881  0.942  16.160  
MgInCu_{4}  mp30587  0.070  4.861  0.010  1.704  0.079  5.029  0.986  20.072  
NaGa_{4}  mp454  0.021  2.473  0.388  5.206  0.009  1.236  0.297  9.522  
ScCu  mp1169  0.004  1.701  0.108  3.681  0.000  0.006  2.694  11.775  
SrGa_{4}  mp1827  0.003  2.685  0.722  6.777  0.009  2.229  0.565  10.163  
SrGaCu_{2}  mp30580  0.000  8.402  0.196  4.749  0.075  4.853  1.026  15.771  
Ti_{2}Cd  mp30501  0.041  3.755  0.061  1.064  0.010  5.197  2.497  8.648  
TiGa3  mp2731  0.023  2.348  0.006  8.246  0.002  0.221  1.296  11.730  
Y_{3}Al_{9} 
 0.001  0.011  0.002  3.022  0.001  3.723  0.893  21.138  
YHg_{2}  mp30725  0.001  1.747  0.006  0.044  0.008  1.741  1.025  13.130  
Zn_{2}C_{2}O_{6}  mp9812  0.054  10.398  0.008  3.995  0.888  11.537  0.679  13.052  
ZnCdPt_{2}  mp30493  0.008  0.134  0.086  8.328  0.010  2.038  1.212  10.743  
ZrHg  mp2510  0.010  4.172  0.004  0.463  0.016  4.428  1.719  7.787  
# of the best  12  5  7  11  7  7  0  0 
3.5 Performance comparison of nonDFT based CSP algorithms against DFTbased CALYPSO
Due to the extremely demanding computational resources, it is not feasible to evaluate the DFTbased CALYPSO algorithm over all 180 test samples. Most test samples are too complex for CALYPSO to predict their structures accurately. Therefore, we selected a subset that includes 13 binary structures and 10 ternary structures for evaluating the DFTbased algorithm and compared its performance with those of nonDFT based CSP algorithms. The test set includes NaGa_{4}, Ti_{2}Cd, Y_{3}Al_{9}, ZrHg, YHg_{2}, TiGa_{3}, SrGa_{4}, ScCu, InHg, Hf_{4}Mn_{8}, Hf_{4}Ni_{2}, Cr_{6}Ga_{2}, Co_{2}Te_{2}, Zn_{2}C_{2}O_{6}, Ca_{3}SnO, ZnCdPt_{2}, SrGaCu_{2}, MgInCu_{4}, MgCu_{4}Sn, LiMg_{2}Ga, Li_{2}CuSn, HfCo_{2}Sn, Co_{2}Ni_{2}Sn_{2}. We chose M3GNet energy distance (ED) and Hausdorff distance (HD) as the evaluation metrics to compare the performances of CALYOSO, CSPML, ParetoCSP, and AGOXpt. Notably, two structures Hf_{4}Mn_{8} and Y_{3}Al_{9} are categorized by polymorphy; therefore, there are two ground truth structures (mp2451 and mp11231) for Y_{3}Al_{9}.The comparison results are shown in Table 4. We find that CALYPSO achieves the lowest M3GNet energy distances for 12 out of 23 test samples, ranging from 0.000 to 0.070 (eV/atom). This includes 8 binary structures and 4 ternary structures. It also records the smallest Hausdorff distance for 5 out of 23 test samples. This demonstrates the superiority of the de novo CSP algorithm with DFT energy calculation for this small scale test set and its ability to find lower energy structures using DFT energy calculation. However, nonDFT based algorithms like CSPML and ParetoCSP show competitive performance as well, both achieving the lowest energy distance for 7 test samples. CSPML achieves the best performance with the lowest HD distance for 11 out of 23 samples, reflecting the effectiveness of the templatebased CSP algorithm in identifying ground truth structures by finding similar template structures. Although CALYPSO has better performance on ED by utilizing DFT calculations to find structures with lower energies, CSPML achieves high performance in both ED and HD for many ternary structures, such as Ca_{3}SnO, Co_{2}Ni_{2}Sn_{2}, HfCo_{2}Sn, LiMg_{2}Ga, MgInCu_{4}, Zn_{2}C_{2}O_{6}.Among the remaining two de novo CSP algorithms, ParetoCSP significantly outperformed AGOXpt, achieving the best ED and HD for 7 out of 23 test samples each. In contrast, AGOXpt did not achieve the best score in any of the test samples for ED and HD. All the HD scores of AGOXpt are large, ranging from 6.864 Å to 21.138 Å, indicating its weakness in predicting structures with similar geometry to the ground truth.
The CSP performance comparison results need to be interpreted holistically. Good performance with a single metric can be misleading as shown in Figure 9, we calculate three types of success rates for CSP prediction including StructureMatcher success rate, Space group match rate, and consensus match rate (both StructureMatcher and space groups need to be matched between predicted structures and the ground truths).CALYPSO achieves the best performance of 43.478% on StructureMatcher success rate. However, its space group match rate is lower than those of CSPML and ParetoCSP, leading to its relatively low consensus success rate of 17.391%. In contrast, the consensus success rates of CSPML and ParetoCSP are 26.087% and 21.739%, respectively, which can be attributed to their higher space group match rates. Overall, the templatebased CSP algorithm CSPML shows the best performance while the de novo CSP algorithm ParetoCSP achieves the competitive performance in terms of the consensus success rate. The poor performance for AGOXpt shows its inability to accurately predict the structures for given compositions. It should be noted that space group determination is dependent on the parameter setting adopted, which may change the space group success rate results in Figure 9. Here the default parameters of the space group analyzer of Pymatgen are used. We also noted that the predicted structures with incorrect space groups may be finetuned into ones with correct space groups using DFTbased relaxation procedures.
3.6 Case studies
Our benchmark results have demonstrated the limited prediction capability of current computational CSP algorithms, including both templatebased and de novo algorithms. Especially, most de novo CSP algorithms cannot accurately predict the space groups for the majority of test samples. To further understand the success and failure cases of different CSP algorithms and how the performance metric scores correlate with the predicted structures, we present three case studies for ErCo_{5}, Ca_{3}SnO, KAsIO_{6}. Additionally, two more case studies, ZnCdPt_{2} and Ga_{2}Cu, are provided in the supplementary file Figure S5, S6 and Table S8, S9 for further reference.
First, we compared the prediction structures of seven algorithms for ErCo_{5} as shown in Figure 10 and Table 5. Out of the seven algorithms, two achieved successful predictions: ParetoCSP (Figure 10(b)) and TCSP (Figure 10(c)). These predicted structures closely match the ground truth structure of ErCo_{5} obtained from the Material Project database, as reflected by the small distance scores in Table 5 (first two rows). The structures predicted by AlphaCrystalII (Figure 10(d)) and GNOAM3GNetPSO (Figure 10(f)) are close to the target structure and have a formation energy distance of 0.000 eV/atom. However, their Sinkhorn distance, Chamfer distance and superpose rmsd are much higher than those structures predicted by ParentoCSP and TCSP, highlighting the importance to interpret the structure similarity using a comprehensive set of criteria. On the other hand, the structures predicted by CSPML (Figure 10(e)), GNOAMEGNetRAS (Figure 10(g)) and AGOXrss (Figure 10(h)) have high energy distance score as well as other distance metrics values, showing that a large energy distance score can be used as an indicator of low quality predictions. Notably, AGOXrss displays much higher distance scores across all metrics compared to the other algorithms, indicating its poor performance in predicting the structure of ErCo_{5}. This case, along with the benchmark studies, underscores the importance of using a set of quantitative criteria to comprehensively evaluate the performance of CSP algorithms.
Algorithm 






 

ParetoCSP  0.000  4.713  0.814  0.838  0.011  0.163  0.007  
TCSP  0.000  4.744  0.813  0.838  0.007  0.170  0.006  
AlphaCrystalII  0.000  9.036  3.012  1.178  0.016  0.240  0.011  
CSPML  0.097  24.634  5.270  1.791  1.492  2.060  0.350  
GNOAM3GNetPSO  0.000  9.037  3.012  1.178  0.008  0.138  0.006  
GNOAMEGNetRAS  2.215  15.440  4.189  1.694  2.148  1.686  1.831  
AGOXrss  1.880  85.596  26.181  14.652  1.950  1.650  2.282 
Next, we choose the ternary structure to compare the performance for DFTbased CSP algorithm CALYPSO with two templatebased CSP algorithms, CSPML and TCSP. Figure 11 shows that the structure of Ca_{3}SnO predicted by the CSPML is more similar than those predicted by CALYPSO and TCSP. We find that the formation energy distances are small for CSPML, CALYPSO and TCSP. However, CSPML has much lower geometric distances, including a Sinkhorn distance of 0.071 Å, a Chamfer distance of 0.029 Å and a superpose RMSD of 0.010 Å. Additionally, its fingerprint score of 0.000 is much better than those predicted by CALYPSO and TCSP, which have scores of 0.116 and 1.172, respectively, indicating the consistency of the different types of CSP metrics used in this study. This consistency also applies to the OFM performance metric. However, it is recognized that the XRD distance of the prediction by CSPML is worse than the XRD score of the prediction by CALYPSO, indicating that the XRD distance alone is not a reliable metric for CSP prediction evaluation.
Algorithm 






 

CSPML  0.001  0.071  0.029  0.010  0.000  0.992  0.007  
CALPSO  0.002  7.946  2.899  1.614  0.116  0.943  0.025  
TCSP  0.007  80.852  5.119  1.125  1.172  1.052  0.176 
To further understand the advantages of different algorithms, we examined the case of quaternary structure prediction for KAs_{4}IO_{6}Cu, in which the templatebased methods work well while the de novo methods fail (Figure 12. Both CSPML and TCSP produced reasonable structures but CSPML’s prediction overall is better despite TCSP’s result having a slightly lower XRD distance value (1.039 compared to 1.040) (See Table 7). In contrast, the structure predicted by ParetoCSP is significantly worse across all performance metrics. The much higher Sinkhorn distance of 207.028 Å, a Chamfer distance of 23.239 Å, and a superpose RMSD of 20.014 Å indicate poor geometric similarity of the predicted structure by ParetoCSP to the ground truth structure (Figure LABEL:fig(d)).
Algorithm 






 

TCSP  0.000  1.403  0.234  0.037  0.143  1.039  0.025  
CSPML  0.000  1.403  0.234  0.037  0.143  1.040  0.025  
ParetoCSP  0.705  207.028  23.239  20.014  2.254  2.529  0.521 
4 Discussion
Objective and accurate evaluation and comparison of different CSP algorithms are nontrivial due to the complexity of structure comparison and the inherent symmetry of crystal structures plus the possible polymorphism of a given test structure. Here we show several aspects that need special attention to accurately interpret the evaluation results and issues that may arise during CSP algorithm performance evaluation.
Polymorphism test samples:
In our benchmark set, there are several test structures that have alternative structures with the same composition due to structural polymorphism. For these test structures, the predicted structure of a CSP algorithm is compared to each of the polymorphism ground truth structures and the one with the smallest distance is selected to calculate the distance error. Note that in this benchmark study, we only consider the top1 prediction performance.
Ranking score bias:
We would like to point out that we need to cautiously interpret the rankings of different CSP algorithms sorted by our ranking scores as shown in Figure 3. For example, the AGOX series algorithms have shown better rankings than the GNOA series of algorithms in Figure 3 to Figure 8 while in Figure 2, the AGOX algorithms have zero success rate according to the StructureMatcher criterion while GNOA algorithms have successfully predicted several test structures. This discrepancy is due to the fact that our ranking score penalizes those algorithms that cannot predict any valid structures. In our case here, the GNOA cannot find any valid structures for quite many test structures, leading to their low ranks.
Cautious interpretation of StructureMatcher results:
Several studies have used the Pymatgen’s StructureMatcher to check if two structures are identical [68] and [65]. In the CDVAE experiments, two structures are deemed as identical if StructureMatcher returns true with the following parameters ltol=0.3, stol=0.5, angle_tol=10. In a recent study [65], a more stringen parameter settings of ltol=0.2, stol=0.3, angle_tol=5 are used to check if two structures are identical. However, we find that there are quite many cases StructureMatcher reports two structures to be identical while their space group numbers are not even equal (See Supplementary Table S10 and S11 for examples). It is thus critical to check the space groups even the StructureMatcher reports as identical despite that the space group determination is also based on a given set of parameters (usually using the default values).
CrystalNN Fingerprint dependent structure similarity:
Another structure fingerprint distance (CrystalNNFingerprint) based structure identity checking method was also used in a CSP study [69]. To calculate the similarity between two structures i and j, the given structures are first encoded into a vectortype structural descriptors with their local coordination information (site fingerprint) from all sites. Then, the structure similarity $\tau$ was calculated as the Euclidean distance between the crystal structure descriptors. Structures with dissimilarity $\tau\leq 0.2$ were treated as similar structures in their study [69]. However we find this measurement has limitations, as structure pairs with fingerprint distance less than the threshold 0.2 can also have different space groups (See Supplementary Figure S8 and Table S12) and a space group check with a set of specific parameters or default parameters is needed, similar to the case of using Pymatgen StructureMatcher.
5 Conclusion
Crystal structure prediction plays a crucial role in discovering novel function materials. However, conventional first principle based CSP algorithms currently have limited capability to predict complex crystal structures. Here we conduct a comprehensive benchmark study of 13 CSP algorithms covering templatebased, ML potential based, contact map based, and DFTbased CSP algorithms, aiming to illustrate the potential and performance gaps of modern ML potential and deep learning based CSP as well as the widely used templatebased CSPs in terms of scalability and accuracy. The algorithms are evaluated over 180 wellselected test set comprising of binary, ternary, and quarternary crystal structures with diverse symmetries and the numbers of atoms in unit cells. All the algorithm performances are calculated using a set of quantitative metrics along with their relative ranking scores over 180 test structures, making it possible to achieve relatively objective performance comparisons.
Our extensive benchmark experiments have uncovered several performance trends and factors that contribute to better CSP performances. First, we find that templatebased CSP algorithms can achieve strong performance when a suitable template structure can be found, which is due to the ubiquitous existence of typical structural prototypes [18, 70] and the wide applications of such elemental substitution CSP algorithms in discovering a large number of hypothetical materials [35, 66]. However, such templatebased CSP algorithms cannot be used to discover materials with novel structural prototypes. Next, it is observed that the machine learning potentials based CSP algorithms have made significant progress in the past few years, leading to competitive algorithms for CSP, especially for those without good templates. For these algorithms, their performances strongly depend on the global search capability of the search algorithms and the quality of the ML potentials. For example, with the same M3GNet potential, the ParetoCSP is better than the AGOX algorithms and also outperforms GNOA algorithms in terms of quantitative metrics due to its enhanced search capability. A further comparison of the ML potential based CSP algorithms with the DFTbased CALYPSO shows that the former class of modern CSP algorithms have demonstrated strong performance and outperform the later one for most of the test samples. Even for the relatively simple crystal structure test sets, the templatebased algorithms and MLpotential based CSP algorithms both showed better performance than DFTbased CALYPSO in terms of the formation energy distance and Hausdorff distance. However, our benchmark results also showed all current de novo CSP algorithms (nontemplate methods) are still in an early stage of development: most of them cannot even accurately predict the space group or crystal systems for a majority of the 180 test samples. Our evaluation of DFTbased CALYPSO also showed the lack of scalability for such DFTbased de novo CSP algorithms as it is almost infeasible to complete predictions for all 180 test samples (It should noted that modern CALYPSO has also incorporated the ML potential for more scable CSP). However, DFTbased de novo CSP algorithms have their unique advantage in predicting crystal structures within special conditions such as highpressure for which there is currently no ML potential for such condition. It should be noted that due to the subjectivity of selecting the 180 test samples, our evaluation has inherent bias despite our effort in trying to cover diverse structures, symmetries, and structural prototypes. So the rankings of different algorithms should not be used to judge the superiority of any algorithm but just should be used to guide the application of appropriate algorithms based on the application scenario. For example, for a given composition, it is reasonable to predict its structure first by using the templatebased algorithms such as CSPML and ML potential based de novo algorithms such as ParetoCSP or GNOA and check their formation energy, Eabovehull energy, and mechanical stability. If still not satisfactory, one can then try the DFTbased de novo methods such as CALYPSO if the composition is not too complex.
Overall, our benchmark has demonstrated the significant progress of machine learning potential based CSP algorithms and its promising prospect to achieve scalable CSP due to the emergence of better search algorithms and modern machine learning potentials. Our benchmark data and quantitative performance evaluation metrics, the opensourced codes of such CSP algorithms and their independence of DFT calculations thus paved the way to allow researchers from a wide variety of researchers from the communities of AI, data science, statistics to explore this promising and significant CSP problem.
6 Data and Code Availability
The 180 test structures are obtained from the Materials Project database. Their mpids are available from our Github repository https://github.com/usccolumbia/cspbenchmark.The code for calculating ranking scores can be also downloaded from the Github repository. The performance metrics calculation is done using the code from the CSPBenchMetrics repository https://github.com/usccolumbia/CSPBenchMetrics. The open source CSP codes are available from their corresponding websites as shown in Table 1. in the main text. We have modified AGOX and GNOA to integrate them with the neural network potential model M3GNet.
7 Contribution
Conceptualization, J.H.; methodology,J.H. L. W., S. O., R.D., N.F., Y.S.,E.S., M.X. ; software, L.W., S.O.,Y.S.; resources, J.H.; writing–original draft preparation, J.H., L.W., S.O., N.F., R.D.,E.S.,M.X.; writing–review and editing, J.H., R.D., S.O., N.F.; visualization, L.W.,S.O.; supervision, J.H.; funding acquisition, J.H.
Acknowledgement
We would like to thank the helpful discussion and suggestions of Prof. Yanchao Wang of Jilin University. The research reported in this work was supported in part by National Science Foundation under the grant and 2110033, OAC2311203, and 2320292. The views, perspectives, and content do not necessarily represent the official views of the NSF.
References
 [1]Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and JohnMoult.Critical assessment of methods of protein structure prediction(casp)—round xiv.Proteins: Structure, Function, and Bioinformatics,89(12):1607–1617, 2021.
 [2]John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov,Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, AugustinŽídek, Anna Potapenko, etal.Highly accurate protein structure prediction with alphafold.Nature, 596(7873):583–589, 2021.
 [3]DavidH Bowskill, IsaacJ Sugden, Stefanos Konstantinopoulos, ClaireS Adjiman,and ConstantinosC Pantelides.Crystal structure prediction methods for organic molecules: State ofthe art.Annual Review of Chemical and Biomolecular Engineering,12:593–623, 2021.
 [4]ColinW Glass, ArtemR Oganov, and Nikolaus Hansen.Uspex—evolutionary crystal structure prediction.Computer physics communications, 175(1112):713–720, 2006.
 [5]Giancarlo Trimarchi, ArthurJ Freeman, and Alex Zunger.Predicting stable stoichiometries of compounds via evolutionaryglobal spacegroup optimization.Physical Review B, 80(9):092101, 2009.
 [6]Xiangyang Liu, Haiyang Niu, and ArtemR Oganov.Copex: coevolutionary crystal structure prediction algorithm forcomplex systems.npj Computational Materials, 7(1):199, 2021.
 [7]DetlefWM Hofmann and Joannis Apostolakis.Crystal structure prediction by data mining.Journal of Molecular Structure, 647(13):17–39, 2003.
 [8]ChristopherC Fischer, KevinJ Tibbetts, Dane Morgan, and Gerbrand Ceder.Predicting crystal structure by merging data mining with quantummechanics.Nature materials, 5(8):641–646, 2006.
 [9]Guanjian Cheng, XinGao Gong, and WanJian Yin.Crystal structure prediction by combining graph network andoptimization algorithm.Nature communications, 13(1):1492, 2022.
 [10]AntonO Oliynyk, LawrenceA Adutwum, BrentW Rudyk, Harshil Pisavadia, SogolLotfi, Viktor Hlukhyy, JamesJ Harynuk, Arthur Mar, and Jakoah Brgoch.Disentangling structural confusion through machine learning:structure prediction and polymorphism of equiatomic ternary phases abc.Journal of the American Chemical Society, 139(49):17870–17881,2017.
 [11]Yanchao Wang, Jian Lv, LiZhu, and Yanming Ma.Crystal structure prediction via particleswarm optimization.Physical Review B, 82(9):094116, 2010.
 [12]DavidC Lonie and Eva Zurek.Xtalopt: An opensource evolutionary algorithm for crystal structureprediction.Computer Physics Communications, 182(2):372–387, 2011.
 [13]MadsPeterV Christiansen, Nikolaj Rønne, and Bjørk Hammer.Atomistic global optimization x: A python package for optimization ofatomistic structures.The Journal of Chemical Physics, 157(5):054701, 2022.
 [14]DavidJ Wales and JonathanPK Doye.Global optimization by basinhopping and the lowest energy structuresof lennardjones clusters containing up to 110 atoms.The Journal of Physical Chemistry A, 101(28):5111–5116, 1997.
 [15]DavidA Kofke.On the acceptance probability of replicaexchange monte carlo trials.The Journal of chemical physics, 117(15):6911–6914, 2002.
 [16]MaltheK Bisbo and Bjørk Hammer.Global optimization of atomic structure enhanced by machine learning.Physical Review B, 105(24):245404, 2022.
 [17]Lai Wei, Nihang Fu, EdirisuriyaMD Siriwardane, Wenhui Yang, SadmanSadeedOmee, Rongzhi Dong, Rui Xin, and Jianjun Hu.Tcsp: a templatebased crystal structure prediction algorithm formaterials discovery.Inorganic Chemistry, 2022.
 [18]SeanD Griesemer, Logan Ward, and Chris Wolverton.Highthroughput crystal structure solution using prototypes.Physical Review Materials, 5(10):105003, 2021.
 [19]Yanchao Wang, Jian Lv, LiZhu, and Yanming Ma.Crystal structure prediction via particleswarm optimization.Physical Review B, 82(9):094116, 2010.
 [20]PavelE Dolgirev, IvanA Kruglov, and ArtemR Oganov.Machine learning scheme for fast extraction of chemicallyinterpretable interatomic potentials.AIP Advances, 6(8):085318, 2016.
 [21]EvgenyV Podryabinkin, EvgenyV Tikhonov, AlexanderV Shapeev, and ArtemROganov.Accelerating crystal structure prediction by machinelearninginteratomic potentials with active learning.Physical Review B, 99(6):064114, 2019.
 [22]Qunchao Tong, Lantian Xue, Jian Lv, Yanchao Wang, and Yanming Ma.Accelerating calypso structure prediction by datadriven learning ofa potential energy surface.Faraday discussions, 211:31–43, 2018.
 [23]Qiuping Yang, Jian Lv, Qunchao Tong, Xin Du, Yanchao Wang, Shoutao Zhang,Guochun Yang, Aitor Bergara, and Yanming Ma.Hard and superconducting cubic boron phase via swarmintelligencestructural prediction driven by a machinelearning potential.Physical Review B, 103(2):024505, 2021.
 [24]Qunchao Tong, Xiaoshan Luo, AdebayoA Adeleke, Pengyue Gao, YuXie, Hanyu Liu,Quan Li, Yanchao Wang, Jian Lv, Yansun Yao, etal.Machine learning metadynamics simulation of reconstructive phasetransition.Physical Review B, 103(5):054107, 2021.
 [25]SoTakamoto, Satoshi Izumi, and JuLi.Teanet: Universal neural network interatomic potential inspired byiterative electronic relaxations.Computational Materials Science, 207:111280, 2022.
 [26]SoTakamoto, Chikashi Shinagawa, Daisuke Motoki, Kosuke Nakago, Wenwen Li, IoriKurata, Taku Watanabe, Yoshihiro Yayama, Hiroki Iriguchi, Yusuke Asano,etal.Towards universal neural network potential for material discoveryapplicable to arbitrary combination of 45 elements.Nature Communications, 13(1):2991, 2022.
 [27]Kamal Choudhary, Brian DeCost, Lily Major, Keith Butler, Jeyan Thiyagalingam,and Francesca Tavazza.Unified graph neural network forcefield for the periodic table:solid state applications.Digital Discovery, 2023.
 [28]SadmanSadeed Omee, Lai Wei, Ming Hu, and Jianjun Hu.Crystal structure prediction using neural network potential andagefitness pareto genetic algorithm.Journal of Materials Informatics, 2024.
 [29]Minoru Kusaba, Chang Liu, and Ryo Yoshida.Crystal structure prediction with machine learningbased elementsubstitution.Computational Materials Science, 211:111496, 2022.
 [30]Farren Curtis, Xiayue Li, Timothy Rose, Alvaro VazquezMayagoitia, SaswataBhattacharya, LucaM Ghiringhelli, and Noa Marom.Gator: a firstprinciples genetic algorithm for molecular crystalstructure prediction.Journal of chemical theory and computation, 14(4):2246–2264,2018.
 [31]ChrisJ Pickard and RJNeeds.Highpressure phases of silane.Physical review letters, 97(4):045504, 2006.
 [32]ChrisJ Pickard and RJNeeds.Ab initio random structure searching.Journal of Physics: Condensed Matter, 23(5):053201, 2011.
 [33]MaltheK Bisbo and Bjørk Hammer.Global optimization of atomistic structure enhanced by machinelearning.arXiv preprint arXiv:2012.15222, 2020.
 [34]Will Tipton and Richard Hennig.Gasp: The genetic algorithm for structure and phase prediction, 2012.
 [35]Chi Chen and ShyuePing Ong.A universal graph deep learning interatomic potential for theperiodic table.Nature Computational Science, 2(11):718–728, 2022.
 [36]HenrikLund Mortensen, SørenAger Meldgaard, MaltheKjær Bisbo,MadsPeterV Christiansen, and Bjørk Hammer.Atomistic structure learning algorithm with surrogate energy modelrelaxation.Physical Review B, 102(7):075427, 2020.
 [37]Tomoki Yamash*ta, Shinichi Kanehira, Nobuya Sato, Hiori Kino, Kei Terayama,Hikaru Sawahata, Takumi Sato, Futoshi Utsuno, Koji Tsuda, Takashi Miyake,etal.Cryspy: a crystal structure prediction tool accelerated by machinelearning.Science and Technology of Advanced Materials: Methods,1(1):87–97, 2021.
 [38]DavidC Lonie and Eva Zurek.Xtalopt: An opensource evolutionary algorithm for crystal structureprediction.Computer Physics Communications, 182(2):372–387, 2011.
 [39]Jianjun Hu, Yong Zhao, Yuqi Song, Rongzhi Dong, Wenhui Yang, Yuxin Li, andEdirisuriya Siriwardane.Alphacrystal: Contact map based crystal structure prediction usingdeep learning.arXiv preprint arXiv:2102.01620, 2021.
 [40]Yuqi Song, Rongzhi Dong, Lai Wei, Qin Li, and Jianjun Hu.Alphacrystalii: Distance matrix based crystal structure predictionusing deep learning.arXiv preprint arXiv:2404.04810, 2024.
 [41]Georg Kresse and Jürgen Furthmüller.Efficient iterative schemes for ab initio totalenergy calculationsusing a planewave basis set.Physical review B, 54(16):11169, 1996.
 [42]JohnP Perdew, Kieron Burke, and Matthias Ernzerhof.Generalized gradient approximation made simple.Physical review letters, 77(18):3865, 1996.
 [43]PeterE Blöchl.Projector augmentedwave method.Physical review B, 50(24):17953, 1994.
 [44]Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and ShyuePing Ong.Graph networks as a universal machine learning framework formolecules and crystals.Chemistry of Materials, 31(9):3564–3572, 2019.
 [45]Xiangyu Yin and ChrysanthosE Gounaris.Search methods for inorganic materials crystal structure prediction.Current Opinion in Chemical Engineering, 35:100726, 2022.
 [46]KarstenWedel Jacobsen, JKNorskov, and MarttiJ Puska.Interatomic interactions in the effectivemedium theory.Physical Review B, 35(14):7423, 1987.
 [47]Pierre Hohenberg and Walter Kohn.Inhom*ogeneous electron gas.Physical review, 136(3B):B864, 1964.
 [48]LuJeu Sham and Walter Kohn.Oneparticle properties of an inhom*ogeneous interacting electron gas.Physical Review, 145(2):561, 1966.
 [49]MaltheK Bisbo and Bjørk Hammer.Efficient global structure optimization with a machinelearnedsurrogate model.Physical review letters, 124(8):086102, 2020.
 [50]JeanPhilippe Vert, Koji Tsuda, and Bernhard Schölkopf.A primer on kernel methods.Kernel methods in computational biology, 47:35–70, 2004.
 [51]AlbertP Bartók, Risi Kondor, and Gábor Csányi.On representing chemical environments.Physical Review B, 87(18):184115, 2013.
 [52]LasseB Vilhelmsen and Bjørk Hammer.A genetic algorithm for first principles global structureoptimization of supported nano structures.The Journal of chemical physics, 141(4):044711, 2014.
 [53]Haitham Seada and Kalyanmoy Deb.Unsgaiii: a unified evolutionary optimization procedure for single,multiple, and many objectives: proofofprinciple results.In International conference on evolutionary multicriterionoptimization, pages 34–49. Springer, 2015.
 [54]MichaelD Schmidt and Hod Lipson.Agefitness pareto optimization.In Proceedings of the 12th annual conference on Genetic andevolutionary computation, pages 543–544, 2010.
 [55]Brian Kulis etal.Metric learning: A survey.Foundations and Trends® in Machine Learning,5(4):287–364, 2013.
 [56]Chang Liu, Erina Fujita, Yukari Katsura, Yuki Inada, Asuka Ishikawa, RyujiTamura, Kaoru Kimura, and Ryo Yoshida.Machine learning to predict quasicrystals from chemical compositions.Advanced Materials, 33(36):2102507, 2021.
 [57]Anubhav Jain, ShyuePing Ong, Geoffroy Hautier, Wei Chen, WilliamDavidsonRichards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, GerbrandCeder, etal.Commentary: The materials project: A materials genome approach toaccelerating materials innovation.APL materials, 1(1), 2013.
 [58]CameronJ Hargreaves, MatthewS Dyer, MichaelW Gaultois, VitaliyA Kurlin, andMatthewJ Rosseinsky.The earth mover’s distance as a metric for the space of inorganiccompositions.Chemistry of Materials, 32(24):10610–10620, 2020.
 [59]Nihang Fu, Jeffrey Hu, Ying Feng, Gregory Morrison, HansConradzur Loye, andJianjun Hu.Composition based oxidation state prediction of materials using deeplearning language models.Advanced Science, 10(28):2301011, 2023.
 [60]Wenhui Yang, Edirisuriya MDilanga Siriwardane, Rongzhi Dong, Yuxin Li, andJianjun Hu.Crystal structure prediction of materials with high symmetry usingdifferential evolution.Journal of Physics: Condensed Matter, 33(45):455902, 2021.
 [61]Jeremy Rapin and Olivier Teytaud.Nevergrad: a gradientfree optimization platform. github.FacebookResearch/Nevergrad, 2018.
 [62]Greg Landrum etal.Rdkit: A software suite for cheminformatics, computational chemistry,and predictive modeling.Greg Landrum, 8, 2013.
 [63]Daniil Polykovskiy, Alexander Zhebrak, Benjamin SanchezLengeling, SergeyGolovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, AlekseyArtamonov, Vladimir Aladinskiy, Mark Veselov, etal.Molecular sets (moses): a benchmarking platform for moleculargeneration models.Frontiers in pharmacology, 11:565644, 2020.
 [64]Lai Wei, Qin Li, SadmanSadeed Omee, and Jianjun Hu.Towards quantitative evaluation of crystal structure predictionperformance.Computational Materials Science, 235:112802, 2024.
 [65]Xiaoshan Luo, Zhenyu Wang, Pengyue Gao, Jian Lv, Yanchao Wang, Changfeng Chen,and Yanming Ma.Deep learning generative model for crystal structure prediction.arXiv preprint arXiv:2403.10846, 2024.
 [66]Amil Merchant, Simon Batzner, SamuelS Schoenholz, Muratahan Aykol, GowoonCheon, and EkinDogus Cubuk.Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023.
 [67]Kuo Bao, Stefan Goedecker, Kenji Koga, Frédéric Lançon, andAlexey Neelov.Structure of large gold clusters obtained by global optimizationusing the minima hopping method.Physical Review B, 79(4):041405, 2009.
 [68]Tian Xie, Xiang Fu, OctavianEugen Ganea, Regina Barzilay, and TommiSJaakkola.Crystal diffusion variational autoencoder for periodic materialgeneration.In International Conference on Learning Representations, 2021.
 [69]Chang Liu, Hiromasa Tamaki, Tomoyasu Yokoyama, Kensuke Wakasugi, SatoshiYotsuhashi, Minoru Kusaba, and Ryo Yoshida.Shotgun crystal structure prediction using machinelearned formationenergies.arXiv preprint arXiv:2305.02158, 2023.
 [70]MichaelJ Mehl, David Hicks, Cormac Toher, Ohad Levy, RobertM Hanson, GusHart, and Stefano Curtarolo.The aflow library of crystallographic prototypes: part 1.Computational Materials Science, 136:S1–S828, 2017.