Threshold simulation for Benzene — mol-CSPy 1.0 documentation (2024)

Threshold simulation for Benzene — mol-CSPy 1.0 documentation (1)

Step 1: Obtain a geometry file for Benzene

Typically these should be optimized gas phase conformations, from aprogram like Gaussian or otherwise.

For more information, please see the Crystal structure prediction of Acetic acid example.

Step 2: Perform distributed multipole analysis

The distributed multipole analysis can be performed as follows:

cspy-dma benzene.xyz

This will produce the following files:

benzene.mols # molecular axis definition (NEIGHCRYS/DMACRYS) formatbenzene.dma # molecular multipolesbenzene_rank0.dma # molecular charges in the same format (probably from MULFIT or similar)

Step 3: Configure the TOML file

Below is an example contents of the cspy.toml file for this simulation.

csp_minimization_step = [{ kind = "dmacrys", electrostatics = "multipoles" },{ kind = "dmacrys", CONP = true, PRES = "0.1 GPa", electrostatics = "multipoles"},{ kind = "dmacrys", electrostatics = "multipoles" },][mc]num_steps = 100move = [ [ "tra", "0.0", "0.5"], [ "rot", "0.0", "0.05"], [ "u_a", "0.0", "0.5"], [ "u_l", "0.0", "0.5"], [ "vol", "0.0", "25"]]move_scale = 1.0move_all = falseauto_prob = trueauto_cutoff = true[basin_hopping]dump_accept = false[threshold]interval_para = [ "fixed", "10", ]increase_para = [ "fixed", "5.0", ]minimize = true[dmacrys]timeout = 200.0

[csp_minimization_step] - This section contains a list of dictionaries which individually describe a minimization step. In the above example,there are three lattice energy minimzation steps performed by DMACRYS. All steps use an anisotropic atom–atom force-field energy modelplus electrostatic interactions described by a multipole series (up to hexadecapole) using distributed multipole analysis. The difference in step 2from the other steps is that an external pressure of 0.1 GPa is applied.

[mc] & [basin hopping] - These sections contain settings for the Monte Carlo simulation that are used in both threhold and QRBH simulations.num_steps is the total length of the simulation is 100 steps (accepted and rejected). move is used to define the moveset by a list of listswith each sublist specifying a movetype and its parameters. The first param of a move is its type, the available types are: molecular translation (tra),molecular rotation (rot), unit cell angles (u_a), unit cell lengths (u_l), and unit cell volume (vol). The second parameter is theselection probability. If the probabilities are 0 or auto_prob is true then the probabilities will be calculated as the proportion of the total degrees of freedom.The third parameter is the cutoff for the move, i.e.the maximum magnitude possible for that move. The units for each move are as follows: translation is Å,rotation is degrees, volume is Å^{3}. The auto_cutoff option scales the volume cutoff based on the number of molecules in the unit cell.move_scale scales the cutoffs by a scalar each time the energy lid is increased. move_all is used to specify whether to apply all available types of moveat one perturbation or to apply each move as a single pertubation. Finally, the dump_accept keyword in the [basin_hopping] section is used toonly dump accepted structures if set to true.

[threhold] - This section is used to set the energy threhold and steps per lid. Here, interval_para is the number of steps per energy threshold/lid.The first parameter specifies the type strategy, e.g.fixed means same number of steps each energy lid. Likewise, increase_para is the energy increasebetween energy thresholds, e.g.increases 5.0 kJ/mol each iteration. Again the first parameter determines the strategy. The total number of lids is determinedby dividing the num_steps by the interval_para steps. The minimize option sets whether to do the minimization of the accepted structures concurrently. Ifrunning many trajectories, it can be more efficient to just do the MC trajectories by themselves using 1 core per trajectory and then do the optimizationsafterwards using large scale parallelisation (e.g. using cspy-reoptimize). Minimizing the accepted structures is essential since this is how we determinewhat minima are sampled during the trajectory. Typically, we want the same energy surface throughout, i.e. the minimization steps in the cspy.toml shouldbe the same (though including a pressure step is good since high energy structures sampled during the threshold simulation can often exhibit vacuum gaps between layers).

[dmacrys] - This section is used to specify settings for the DMACRYS program. In this example, it is used to modify the timeout whichtells DMACRYS how long to perform a calculation before stopping and moving on to another calculation.

Step 4: Obtaining a starting crystal structure of benzene

The threshold algorithm is usually initiated several local minima on the energy landscape and therefore requires input crystal structures for benzene.In this example, we demonstrate using a single crystal structure - benzene_form_III_csp_P1.res corresponding to the high pressure form III polymorphof benzene as found after performing crystal structure prediction. It is important that the initial crystal structures are converted to P1 symmetry.This can be achieved with the following python script:

from cspy.crystal import Crystalimport syscrystal = Crystal.load(sys.argv[1])p1_crystal = crystal.as_P1()p1_crystal.save(sys.argv[2])

The python script is then run on the command line:

python SCRIPT.py input_structure output_structure

Where input_structure is the input SHELX or CIF crystal structure and output_structure is the filename you want to save the P1 structure to. The file extensionused on output_structure will dictate the type of file that is written (i.e. .res will be a SHELX file and .cif will be a CIF file.)

Below is the file contents of benzene_form_III_csp_P1.res.

TITL benzene_form_III_csp-P1-1-1-1CELL 0.7 8.0828 5.5371 5.6702 90 71.6477 90LATT -1SFAC C HC1 1 0.742101910000 0.332654780000 0.330318380000C2 1 0.640453210000 0.312381590000 0.578596760000C3 1 0.851634060000 0.531097440000 0.251713210000C4 1 0.648362810000 0.490526050000 0.748280230000C5 1 0.859550100000 0.709251950000 0.421390990000C6 1 0.757901400000 0.688978750000 0.669669360000H1 2 0.735965720000 0.194028970000 0.198294610000H2 2 0.555227380000 0.158003620000 0.639830350000H3 2 0.930733010000 0.546864180000 0.058529300000H4 2 0.569263860000 0.474759320000 0.941464150000H5 2 0.944775480000 0.863618510000 0.360178410000H6 2 0.764036190000 0.827588620000 0.801697560000C7 1 0.257898090000 0.832654780000 0.669681620000C8 1 0.359546790000 0.812381590000 0.421403240000C9 1 0.148365940000 1.031097440000 0.748286790000C10 1 0.351637190000 0.990526050000 0.251719770000C11 1 0.140449900000 1.209251950000 0.578609010000C12 1 0.242098600000 1.188978750000 0.330330640000H7 2 0.264034280000 0.694028970000 0.801705390000H8 2 0.444772620000 0.658003620000 0.360169650000H9 2 0.069266990000 1.046864180000 0.941470700000H10 2 0.430736140000 0.974759320000 0.058535850000H11 2 0.055224520000 1.363618510000 0.639821590000H12 2 0.235963810000 1.327588620000 0.198302440000

Step 5: Running the threshold calculation

The threshold calculation can be run on a SLURM cluster with the following script:

#!/bin/bash#SBATCH --nodes=5#SBATCH --ntasks-per-node=40#SBATCH --time=24:00:00cd $SLURM_SUBMIT_DIRsource ~/.bashrcconda activate cspympiexec cspy-threshold benzene.xyz -r benzene_form_III_csp_P1.res -a benzenex2.mols -m benzenex2.dma -c benzenex2_rank0.dma -g 1 --trials-per-res 1

Note

A space group flag (-g) is currently required, however, this does not actually determinethe space group sampled (that is determined by the space group of thecrystal files). This flag simply determines the space group label in thestructure ids.Generally, all threshold simulations should be done in P1 since any space group symmetryrepresents a constraint on the transitions that can be sampled and soleads to overestimating the lowest energy connection, i.e.remember toconvert your crystals to P1 before running threshold simulations.

For convergence, it can be useful to have multiple trajectories/trialsper structure. This is set with the –-trials-per-res option.

This will produce a database file benzene-1.db which stores crystal structures from the threshold simulationand status.txt. The contents of status.txt looks similar to this at the beginning of the simulation:

 target perturb accept reject valid invalid running target_trial active_trial finish_trial unique_min ttime mtime1 100 1 0 1 1 0 0 1 1 0 0 9.84 8.42
  • target - The requested number of simulation steps

  • perturb - The number of pertubations that have been carried out

  • accept - The number of accepted pertubations

  • reject - The number of rejected pertubations

  • valid - The number of valid optimisations

  • invalid - The number of invalid optimisations

  • running - The number of running jobs (currently always reads 0 but iwll be fixed in next update)

  • target_trial - The requested number of trials/trajectories

  • active_trial - The number of running trials/trajectories

  • finish_trial - The number of complete trials/trajectories

  • unique_min - The number of unique structures

  • ttime - The total time

  • mtime - The minimization time

Step 6: Generating the Disconnectivity Graph

Firstly, the output database (benzene-1.db) must be clustered.

cspy-db cluster benzene-1.db

If the user has access to the CSD python API, we recommend following this up with COMPACK clustering.

cspy-db cluster benzene-1.db -m compack

The disconnectivity graph is then generated using the connectivity_graph.py file:

python $CSPY_DIR/clustering/connectivity_graph.py benzene-1.db

The script is ran on the clustered original database (NOT the output database! this does not contain the trajectory information and so will not work).

Threshold simulation for Benzene — mol-CSPy 1.0 documentation (2024)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 6207

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.