Running a simple DFT calculation on PACE Supercomputer Cluster

5.4. Running a simple DFT calculation on PACE Supercomputer Cluster#

In this lecture, we will learn about running DFT calculations on PACE. We will first learn about SLURM submission scripts that are used to submit jobs on PACE. As a member of Medford group, you will have access to two supercomputer clusters here at Georgia Tech: PACE-Phoenix and PACE-Hive. If you are part of the VIP course, you will have access to PACE-ICE cluster. You shouldn’t use “#SBATCH -A gts-amedford6” if you are on PACE-ICE which is requesting resources from your PI’s account. If you are on PACE-Phoenix, you will need to request for resources from your PI’s account. This means that you are paying for the jobs you are running therefore, these resources should be used judiciously. Note Any code block beginning wih #!/bin/bash indicates bash commands or a bash script. All other code should be assumed to be python. An example run.sh script to submit a SPARC job is given below:

#!/bin/bash
#SBATCH -N 1 --ntasks-per-node=8
#SBATCH --mem-per-cpu=4G 
#SBATCH -t12:00:00
#SBATCH -A gts-amedford6 
#SBATCH -J optimizer
#SBATCH -o Report-%j.out -e Report_err-%j.err

cd $SLURM_SUBMIT_DIR

source ~/.bashrc

module load anaconda3 #It is optional to load anaconda. You don't need to load it if it's already been loaded in the environment 

source /path/to/sparc_env.sh

python calc.py

To submit a job to the SLURM system, you have to use the sbatch command. We usually call this file run.sbatch for convenience. The command for submitting the job will be sbatch run.sbatch. When you do this, you are essentially handing the file to the SLURM queuing system which will scan the file for the blocks containing #SBATCH that specify the resources you are requesting for the job. Each line in this file starting with #SBATCH specifies something different. The line with tag -N are requesting certain resources (nodes and processors per node). You can request the resources based on requirement of your calculations. In this example script, we are requesting for 1 node and 8 processors per node. The next line (#SBATCH –mem-per-cpu=3G) specifies the memory you request per processor. Here it is 3 GB. Hence the total memory we request for is 24 GB. For Phoenix cluster, the default queue is inferno where the jobs are run. -J is the name you want to give the job, -o and -e are the filenames where the outputs and errors will be printed respectively. You can run pace-whoami check the maximum resources available for a particular queue.

If you are submitting a job to PACE-HIVE, you will need to specify the queue using #SBATCH -q. HIVE has different queues like hive, hive-himem, hive-interact which offer different resources and walltime. After the SBATCH block, you can source your environment and run the code. The next line with change directory (cd) command with an alias for current working directory (SLURM_SUBMIT_DIR) changes to the correct folder. More information about SLURM script and submission of jobs on PACE can be found in this link.

5.4.1. Running SPARC DFT code on PACE#

For the purpose of VIP course and training, we will be using SPARC to perform DFT calculations. It is relatively easier to compile SPARC on PACE as compared to QE. In this lesson, we will be setting up our own environments for SPARC, compile the code and learn how to perform DFT calculations using the SPARC python interface.

5.4.1.1. Getting Set Up#

Let’s start by creating a conda environment. After logging in to your PACE account, use the following commands:

#!/bin/bash
module load anaconda3
conda create --name sparc python=3.9  #create an environemnt named 'sparc'

For running SPARC, we will need to install the SPARC code and the sparc-dft-api package to use the python interface for SPARC, and set the environment variables. Note There are at least two compilers available on PACE: gcc and intel. If you choose to use the intel compiler then you will need add -no-multibyte-chars to the line that starts with CFLAGS. When running SPARC, you will need to module load the same compiler that you used for compilation. When possible, we also recommend loading specific modules. E.g., ml anaconda3 loads a deafault from one of several versions of anaconda available. These defaults are liabale to change (without notice) and cause your code to break. Thus, it is preferable to ml anaconda3/2022.05 If this module ever becomes unavailable, you will know because this will break! The following are the commands for loading the modules and setting up the environment file:

#!/bin/bash

# if on rh7 nodes
module load intel mvapich2
# OR if on rh9 and pace-ice
ml intel-oneapi-compilers intel-oneapi-mpi intel-oneapi-mkl

# If need to activate the conda environment,
module load anaconda3
conda activate sparc

# Create a directory called "packages" by using the "mkdir packages" command and then go to the directory.
mkdir ~/packages
cd ~/packages

# Clone the SPARC code in packages directory
git clone https://github.com/SPARC-X/SPARC.git

# Go to makefile inside the directory src and set the variables as per instructions in SPARC documentation 
# Compiling SPARC code while still in src
cd SPARC/src/
make clean; make -j


# Install the sparc-dft-api package
python -m pip install git+https://github.com/SPARC-X/SPARC-X-API

# Download pseduopotentials files
python -m sparc.download_data

After installing all the required packages, we can set up the environment file for specifically running SPARC DFT calculations. Open a text editor on terminal (for eg. vim/nano) and copy the following lines into the file. You will need to update the existing paths in the following file based on where your directories and SPARC executable is located. The name of this environment file can be sparc_env.sh.

#!/bin/bash
#loading the modules

module load intel mvapich2
# OR
ml intel-oneapi-compilers intel-oneapi-mpi intel-oneapi-mkl

module load anaconda3
conda activate sparc

export PATH=/path/to/packages/SPARC/lib:$PATH

export SPARC_PSP_PATH=/path/to/psp_dir/
if [[ -z "${SLURM_NTASKS}" ]]; then
  export ASE_SPARC_COMMAND="/path/to/packages/SPARC/lib/sparc"
else
  export ASE_SPARC_COMMAND="srun /path/to/packages/SPARC/lib/sparc"
fi

What is this doing? It is adding things to the PATH and PYTHONPATH. Linux looks for commands and programs to run by checking through the variable PATH to find the program/command we’ve asked for. Please note that the user should substitute nyu49 with your corresponding username. After setting the environment variables in the environment file, we can source it while running the DFT calculation by specifying it in your SLURM script:

#!/bin/bash
source /path/to/packages/sparc_env.sh

5.4.1.2. Running a calculation using SPARC python interface#

Using ASE to run DFT is just like running EMT like we did while going through the ASE lecture. Let’s copy the example python script below into a new file (e.g., calc_sparc.py):

# import sparc-api and ase 

from sparc import SPARC
from ase.build import molecule
from ase.units import Bohr,Hartree,mol,kcal,kJ,eV

# make the atoms
atoms = molecule('H2O')
atoms.cell = [[8,0,0],[0,8,0],[0,0,8]]
atoms.center()

# setup calculator
parameters = dict(
                EXCHANGE_CORRELATION = 'GGA_PBE',
                D3_FLAG=1,   #Grimme D3 dispersion correction
                SPIN_TYP=0,   #non spin-polarized calculation
                KPOINT_GRID=[1,1,1],  #molecule needs single kpt !
                ECUT=500/Hartree,   #set ECUT (Hartree) or h (Angstrom)
                #h = 0.15,
                TOL_SCF=1e-5,
                RELAX_FLAG=1, #Do structural relaxation (only atomic positions)
                PRINT_FORCES=1,
                PRINT_RELAXOUT=1)

calc = SPARC(atoms = atoms, **parameters)

# set the calculator on the atoms and run
atoms.set_calculator(calc)
print(atoms.get_potential_energy())

Step  0
Step  1
Step  2
Step  3
Step  4
{}
[2, 0, 1] [1, 2, 0]
-482.41097204076414

/home/hice1/nyu49/.local/lib/python3.9/site-packages/sparc/sparc_parsers/utils.py:56: UserWarning: Key POISSON_SOLVER not in validator's parameter list, ignore value conversion!
  warn(
/home/hice1/nyu49/.local/lib/python3.9/site-packages/sparc/sparc_parsers/utils.py:56: UserWarning: Key VERBOSITY not in validator's parameter list, ignore value conversion!
  warn(
/home/hice1/nyu49/.local/lib/python3.9/site-packages/sparc/sparc_parsers/out.py:108: UserWarning: Key atomic mass from run information appears multiple times in your outputfile!
  warn(

There are now keyword arguments in the calculator. “KPOINT_GRID” are the density at which we sample the inverse space. “h” is the grid spacing of our mesh basis set. “TOL_SCF” is the convergence criteria for SCF cycle. “RELAX_FLAG” is just telling it we want to perform a structural relaxtion. You will need to specify parameters or keyword arguments prior to running DFT calculations. If you are using LDA or GGA_PBE as exchange correlation functionals, you will need to run convergence tests on KPOINT_GRID and h(mesh spacing).

Let’s run it (don’t do this normally on headnode.) Note that it has made lots of output files. “SPARC.out” is the main one.

At the top are the settings. Many of these are defaults we did not enter. Next we See Self Consistent Field blocks. These are converging the electron density of the structure. Once converged it gets the energy and forces (electronic relaxtion), then moves the atoms down the potential energy surface (ionic relaxtion).

5.4.1.2.1. Important Note:#

A calculation with higher number of k points and/or higher value of mesh spacing will be more accurate. Sometimes, you will come across convergence issues while running calculations. In such cases, you can look at your main output file (*.out) that saves the energies of each SCF cycle and find out where it breaks. There might be a warning for example:

“WARNING: SCF#1 did not converge to desired accuracy!”

You can play around with input mixing parameters of SPARC in such cases. Some parameters that can be modified if SCF is not converging are:

MIXING_VARIABLE, MIXING_PARAMETER, MIXING_HISTORY, MIXING_PRECOND, CHEB_DEGRESS

More information about the input parameters can be found in the SPARC manual that is available on GitHub (link).

5.4.1.3. Submitting DFT calculations#

Let’s copy in our sparc python script and SLURM batch file. Modify the SLURM file to source the SPARC environment and run the script. The SLURM script should have the following lines:

source /path/to/packages/sparc_env.sh
python calc_sparc.py

5.4.1.4. What is SPARC doing?#

In a broad sense, SPARC is solving the schrodinger equation very approximately. It is doing this through an iterative calculation. You begin with some initial guess of the electron density, then refine that by minimizing the energy. Once you’ve hit some convergence criteria, you terminate this, then you can calculate the energy and forces.

5.4.1.5. Exercise:#

Clone the SPARC and sprac-dft-api repository from GitHub and set up an environment for running calculations in SPARC on PACE.
Write a script to build a Platinum bulk crystal and run a single-point DFT calculation in SPARC.