Crystal structure prediction of Acetic acid — mol-CSPy 1.0 documentation (2024)

Crystal structure prediction of Acetic acid — mol-CSPy 1.0 documentation (1)

In this example we will show how to perform a CSP simulation for acetic acid (refcode: ACETAC01)

Step 1: Obtain a geometry file for acetic acid

Typically these should be optimized gas phase conformations, from aprogram like Gaussian or otherwise.

The molecular structures to be provided as input can either be drawnmanually using visualisation software (e.g. Chemcraft, GaussView, Molden etc.),or molecules can be extracted from crystal structures in the Cambridge Structural Database (CSD).You may also already have sets of XYZ coordinates from e.g. a molecular conformational search.

In the first case, after extracting the XYZ coordinates, a typical input file for Gaussian (given a .com extension)would look something like this:

# METHOD/BASIS-SET opt NoSymm EmpiricalDispersion=GD3BJGeometry optimisation calculation for Acetic Acid0 1C 2.19748 1.16892 0.98592C 1.18991 1.53866 2.00819H 2.42242 0.20859 -0.66920H 1.70368 2.08590 2.83258H 0.43923 2.20860 1.60955H 0.70543 0.68712 2.42875O 1.71300 0.43968 0.00000O 3.36610 1.51453 1.02054

Where METHOD and BASIS-SET are replaced with an approprioate electronic structure theory methodand basis set (e.g. B3LYP/6-311G**). The first line contains the functional and basis set(i.e. the level of theory being applied), the type of calculated (opt - geometry optimisation),a command to impose no symmetry on the molecule (nosymm), and a command to add a dispersioncorrection (DFT lacks any description of dispersion – van der Waals’ forces – so an empirical,post hoc correction improves the energy). The third line is a comment line, the fifth linegives the charge and multiplicity (number of spin configurations) respectively,and the remaining lines give the XYZ coordinates of each atom (in Angstroms).

The blank lines must be included in the file (including the final blank lines).

For further information regarding the Gaussian input file structure, we refer youto the Gaussian support page.

From herein, we will assume you have a (relaxed) molecular geometry in XYZ format in the file acetic.xyz.

Step 2: Perform distributed multipole analysis

The distributed multipole analysis can be performed as follows:

cspy-dma acetic.xyz

This will produce the following files:

acetic.mols # molecular axis definition (NEIGHCRYS/DMACRYS) formatacetic.dma # molecular multipolesacetic_rank0.dma # molecular charges in the same format (probably from MULFIT or similar)

Step 3: Perform crystal structure prediction

To perform a local CSP calculation for acetic acid, (sampling the top ten most commonly observed spacegroups) the following command can be used:

mpiexec -np NUM_CORES cspy-csp acetic.xyz -c acetic_rank0.dma -m acetic.dma -a acetic.mols -g fine10

Where NUM_CORES refers to the number of CPU cores you wish to run the calculation with. This command can also beincoprorated into a job submission script and used on a HPC facility. An example SLURM submission script for acetic acid isgiven below:

#!/bin/bash#SBATCH --nodes=5#SBATCH --ntasks-per-node=40#SBATCH --time=24:00:00cd $SLURM_SUBMIT_DIRmodule load conda/py3-latestsource activate cspympiexec cspy-csp molecule.xyz -c molecule_rank0.dma -m molecule.dma -a molecule.mols -g fine10

Once the calculation begins, a set of SQL databases will appear in the working directory. There will be one per spacegroupand will have the following naming scheme: acetic-SG.db (SG replaced with the spacegroup number)

Step 4: Remove duplicate structures

The database files that are output in step 3 will likely contain many duplicate structures. These arise in situations wherethe structure generator creates a number of structures that optimize into the same minimum on the force-field potential energy surface.We can remove duplicate structures using the following command:

cspy-db cluster acetic-*.db

This will find redundant structures within each of the database files, combine the unique structures into a new database file (defaulting tooutput.db), then find unique structures within the combined file (i.e. search for duplicates across the different spacegroups).

Step 5: Analyse the Landscape

Once you have removed duplicate structures you can use the final database to analyse the results. The following command can be usedto create a csv file of the final structures ordered by energy (default). Each structure will also be saved to a compressed archivein shelx format by default.

usage: cspy-db dump [-h] [-t TABLE_OUTPUT] [-r STRUCTURE_OUTPUT] [-f {cif,res}] [-d] [-e ENERGY] [--parse-metadata] [-s SORT_BY] [--log-level {INFO,DEBUG,ERROR,WARN}] databases [databases ...]

positional arguments

  • databases - Databases to process.

optional arguments

  • -h, --help - Show this help message and exit

  • -t TABLE_OUTPUT, --table-output TABLE_OUTPUT- Name of .csv output file

  • -r STRUCTURE_OUTPUT, --structure-output STRUCTURE_OUTPUT - Name of .zip output file

  • -f {cif,res}, --structure-filetype {cif,res}- File type for compressed structures

  • -d, --include-duplicates- Dump duplicate structures also

  • -e ENERGY, --energy ENERGY - Dump structures that are a within the inputted energy from the global minimum

  • --parse-metadata - Include metadata

  • -s SORT_BY, --sort-by SORT_BY - Sort by a specified column (id, spacegroup, density, energy, minimization_step, trial_number, minimization_time, metadata)

  • --log-level {INFO,DEBUG,ERROR,WARN} - Control level of logging output

We have provided some example python scripts in the Useful Scripts section which can be used to visualise the landscape.

Here is an example landscape from one of our simulations of acetic acid:

Crystal structure prediction of Acetic acid — mol-CSPy 1.0 documentation (2)

Step 6: Reoptimize the final crystal strucures (optional)

If the user wishes to reoptimize the final crystal structures with different minimization parameters (in this example we will use a larger force-field cutoff),the user can employ the cspy-reoptimize.

After clustering in step 4, the user is left with output.db. To reoptimize the crystal structures in this database, an axis file, xyz file of the molecule,charges file and multipoles file are needed

cspy-reoptimize output.db -x acetic.xyz -a acetic.mols -m acetic.dma -c acetic_rank0.dma -p fit --cutoff 30

This will produce a new SQL database which in this example, will have the name output.opt.db.

Crystal structure prediction of Acetic acid — mol-CSPy 1.0 documentation (2024)
Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6201

Rating: 4.7 / 5 (77 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.