Referencing Binding Energies

Referencing Binding Energies#

by Gabriel S. Gusmão (Aug, 2021)

gusmaogabriels@gmail.com/gusmaogabriels@gatech.edu

Binding Energy#

Energy associated with a chemical state given reference energy states, i.e. it is not an absolute quantit. It is a function of reference energies.
Binding energies can be computed from raw-DFT energies or referenced from known reaction energies for a given chemical system.

import pandas as pd
%config Completer.use_jedi = False
from matplotlib import pyplot as plt
%matplotlib inline

Case Study microkinetic model for CO/CO$_2$ hydrogenation to methanol + water-gas shift by Dr. Grabow$^\dagger$#

42 elementary-reaction, 30 chemical-species#

$^\dagger$ Grabow, L. C. & Mavrikakis, M. Mechanism of Methanol Synthesis on Cu through CO2 and CO Hydrogenation. ACS Catal. 1, 365–384 (2011).

Followig, the raw data from Dr. Grabow’s paper is loaded, where dE are reaction energies in $eV$, and a separate (hydrogen reservoir) site is added for atomic hydrogen.

raw_data = pd.read_json('./data/grabow/raw_data.json');
raw_data

	rxn	dE
1	CO2_g + _s -> CO2_s	-0.08
2	H2_g + 2_h -> 2H_h	-0.29
3	CO_g + _s -> CO_s	-0.86
4	H2O_g + _s -> H2O_s	-0.21
5	HCOOH_g + _s -> HCOOH_s	-0.22
6	CH2O_g + _s -> CH2O_s	-0.04
7	CH3OH_g + _s -> CH3OH_s	-0.28
8	HCOOCH3_g + _s -> HCOOCH3_s	-0.10
9	CO2_s + _s -> CO_s + O_s	1.12
10	COOH_s + _s -> CO_s + OH_s	-0.14
11	COOH_s + _s + _h -> CO2_s + H_h + _s	-0.55
12	CO2_s + H2O_s + _s -> COOH_s + OH_s + _s	0.76
13	HCOOH_s + _h -> COOH_s + H_h	0.59
14	H2O_s + _h -> OH_s + H_h	0.21
15	OH_s + _h -> O_s + H_h	0.72
16	2OH_s -> H2O_s + O*_s	0.51
17	HCOO_s + _h -> CO2_s + H_h	0.25
18	H2CO2_s + _h -> HCOO_s + H_h	-0.87
19	HCOOH_s + _h -> HCOO_s + H_h	-0.23
20	CH3O2_s + _h -> HCOOH_s + H_h	-0.10
21	CH3O2_s + _h -> H2CO2_s + H_h	0.54
22	H2CO2_s + _s -> CH2O_s + O_s	0.91
23	CH3O2_s + _s -> CH2O_s + OH_s	0.74
24	CH3O_s + _h -> CH2O_s + H_h	1.02
25	CH3OH_s + _h -> CH3O_s + H_h	0.23
26	HCO_s + _h -> CO_s + H_h	-0.78
27	COH_s + _h -> CO_s + H_h	-1.15
28	HCOO_s + _s -> HCO_s + O_s	2.18
29	HCOH_s + _h -> HCO_s + H_h	-0.09
30	CH2O_s + _h -> HCO_s + H_h	0.40
31	CH2OH_s + _h -> CH2O_s + H_h	0.06
32	CH2OH_s + _h -> HCOH_s + H_h	0.55
33	CH3OH_s + _h -> CH2OH_s + H_h	1.19
34	HCOOH_s + _s -> HCO_s + OH_s	1.24
35	HCOOH_s + _s -> HCOH_s + O_s	2.04
36	CH3O2_s + _s -> CH2OH_s + O_s	1.39
37	CO3_s + _s -> CO2_s + O_s	-0.11
38	HCO3_s + _h -> CO3_s + H_h	1.21
39	CH3O_s + HCOO_s -> HCOOCH3_s + O_s	0.99
40	H2COOCH3_s + _s -> CH3O_s + CH2O_s	0.78
41	H2COOCH3_s + _h -> HCOOCH3_s + H_h +	-0.01
42	HCOOCH3_s + _s -> 2CH2O*_s	1.81

Binding Energies from Raw DFT Energies#

Atomic/elemental matrix#

The atomic matrix consists of the number of chemical elements of a specific type for each chemical species; therefore, it is a $ n_{species}\times n_{elements}$ matrix. Species here can be considered as states, since they encompass both gas-phase and bound chemical species.
Active sites are considered as pseudo-elements in the system

Below, we load a preprocessed atomic matrix for Dr. Grabow’s paper, Ma.

Ma_ = pd.read_json('./data/grabow/atomic_matrix.json')
display(Ma_.head(500))
Ma = Ma_.values

	*_h	*_s	C	H	O
*_h	1	0	0	0	0
*_s	0	1	0	0	0
CH2O*_s	0	1	1	2	1
CH2OH*_s	0	1	1	3	1
CH2O_g	0	0	1	2	1
CH3O*_s	0	1	1	3	1
CH3O2*_s	0	1	1	3	2
CH3OH*_s	0	1	1	4	1
CH3OH_g	0	0	1	4	1
CO*_s	0	1	1	0	1
CO2*_s	0	1	1	0	2
CO2_g	0	0	1	0	2
CO3*_s	0	1	1	0	3
COH*_s	0	1	1	1	1
COOH*_s	0	1	1	1	2
CO_g	0	0	1	0	1
H*_h	1	0	0	1	0
H2CO2*_s	0	1	1	2	2
H2COOCH3*_s	0	1	2	5	2
H2O*_s	0	1	0	2	1
H2O_g	0	0	0	2	1
H2_g	0	0	0	2	0
HCO*_s	0	1	1	1	1
HCO3*_s	0	1	1	1	3
HCOH*_s	0	1	1	2	1
HCOO*_s	0	1	1	1	2
HCOOCH3*_s	0	1	2	4	2
HCOOCH3_g	0	0	2	4	2
HCOOH*_s	0	1	1	2	2
HCOOH_g	0	0	1	2	2
O*_s	0	1	0	0	1
OH*_s	0	1	0	1	1

And folowing GGA-level (PBE) energies for each individial state are loaded for different metallic surfaces.

We are going to use Cu(111) as case study here.

E_raw = pd.read_json('./data/grabow/raw_dft.json')
E_raw.head(5)

	Cu(211)	Cu(111)	Pd(211)	Pt(211)	Rh(211)	Ag(211)	Au(211)
*_h	-197764.135245	-197765.085060	-123999.125283	-103185.773434	-106234.599516	-140714.147663	-134520.546438
*_s	-197764.135245	-197765.085060	-123999.125283	-103185.773434	-106234.599516	-140714.147663	-134520.546438
CH2O*_s	-198613.265305	-198613.816588	-124848.330119	-104034.866635	-107083.793192	-141562.980995	-135369.376106
CH2OH*_s	-198629.463463	-198629.895913	-124864.792637	-104051.846973	-107100.679487	-141578.898364	-135385.740832
CH2O_g	-848.688366	-848.688366	-848.688366	-848.688366	-848.688366	-848.688366	-848.688366

The number of degrees-of-freedom in this system equals the number of chemical elements, including the active sites, for which reference values must be defined.

To compute formation energies, we therefore need to define reference species. In this case we choose $CO_{(g)}$, $H{_2}_{(g)}$ and $H_2O_{(g)}$. Notice that all chemical elements ($C$, $O$, and $H$) must be included in the set of reference chemical species. We following build the extended matrix $M_{ext}=[M_a, I]$, where $I$ is the identity matrix $n_{species}\times n_{species}$, and drop the columns in $I$ associated with the defined chemical species. We finally obtain a full-rank and invertible matrix that can be used to solve for the dummy references $C$, $O$ and $H$, as follows.

\[M_{ext}\times\Delta E_{f}=E_{DFT}\]

ref_sps = ['CO_g','CO2_g','H2_g','*_s','*_h']         # define reference species
Eref    = np.array([0.,0.,0.,0.,0.])                  # reference species energies
mref = np.where([_ in ref_sps for _ in Ma_.index])[0] # find positions of reference species
mcols = np.array(list(set(range(len(Ma)))-set(mref))) # all other species

Solving the linear system

dEf_a  = np.zeros(len(Ma_))                             # allocate space for formation energies
Mext   = np.hstack((Ma,np.eye(Ma.shape[0])[:,mcols]))   # extended matrix only non-refrence species
Es     = E_raw['Cu(111)'].values                        # raw DFT energies vector
dEf_a[mcols] = np.linalg.solve(Mext,Es)[len(ref_sps):]  # linear system

Formation energies

pd.DataFrame(zip(E_raw['Cu(111)'].values,dEf_a),index=Ma_.index,\
             columns=pd.MultiIndex.from_tuples([('Cu(111)','DFT'),('Cu(111)','Ef')]))

	Cu(111)
	DFT	Ef
*_h	-197765.085060	0.000000
*_s	-197765.085060	0.000000
CH2O*_s	-198613.816588	-0.605626
CH2OH*_s	-198629.895913	-0.823442
CH2O_g	-848.688366	-0.562464
CH3O*_s	-198630.805365	-1.732895
CH3O2*_s	-199197.450711	-0.304541
CH3OH*_s	-198646.958871	-2.024892
CH3OH_g	-881.686666	-1.837747
CO*_s	-198582.258153	-0.770208
CO2*_s	-199149.583594	-0.021949
CO2_g	-1384.476584	0.000000
CO3*_s	-199715.612772	2.022571
COH*_s	-198597.111176	0.238278
COOH*_s	-199165.053933	0.369220
CO_g	-816.402885	0.000000
H*_h	-197781.131496	-0.184928
H2CO2*_s	-199180.952833	0.331829
H2COOCH3*_s	-199480.172844	-2.974471
H2O*_s	-198364.322075	0.559701
H2O_g	-599.047721	0.748995
H2_g	-31.723017	0.000000
HCO*_s	-198597.497763	-0.148309
HCO3*_s	-199732.874257	0.622596
HCOH*_s	-198613.355603	-0.144641
HCOO*_s	-199165.814737	-0.391584
HCOOCH3*_s	-199464.223855	-2.886991
HCOOCH3_g	-1699.054584	-2.802780
HCOOH*_s	-199181.606326	-0.321665
HCOOH_g	-1416.294467	-0.094866
O*_s	-198331.439796	1.718963
OH*_s	-198348.124986	0.895282

Binding Energies from Reaction Energies#

Stoichiometr(y/ic) matrix#

The stoichiometry matrix, $M$, represents a graph encompassing the stoichiometry of each chemical species in every elementary step in which it is involved.
The dimensions of the stoichiometry matrix is $n_{reactions}\times n_{species}$ in a particular chemical system.

M_ = pd.read_json('./data/grabow/stoich_matrix.json')
M = M_.values
M_.head(5)             

	*_h	*_s	CO*_s	...	HCOOH*_s	HCOOH_g
CO2_g + _s -> CO2_s	0	-1	0	...	0	0
H2_g + 2_h -> 2H_h	-2	0	0	...	0	0
CO_g + _s -> CO_s	0	-1	1	...	0	0
H2O_g + _s -> H2O_s	0	-1	0	...	0	0
HCOOH_g + _s -> HCOOH_s	0	-1	0	...	1	-1

5 rows × 32 columns

Reaction energies are reference-freem and they can be estimated directly from raw DFT energies by the simple matrix-vector multipliciation between the stoichiometry matrix and the DFT energies vector. $$M\times E_{DFT}=\Delta E_{r}$$ Here, we still use the Cu(111) GGA-level (PBE) energies as in the previous section.

dEr = M.dot(Es) # reaction energies

With the computed reaction energies, formation energies can be defined thrown the Moore-Penrose pseudo inverse of the stoichiometry matrix, $M^+$.

\[ E_{f}=M^+\times\Delta E_{rxn}\]

However, instead of computing the Moore-Penrose matrix inversion directly, we can first perform the singular-value decomposition of $M$ to confirm that the number of zero singular values equals the number of degrees of freedom in the chemical system.

U,s,Vt = np.linalg.svd(M)
plt.figure(dpi=100)
plt.plot(s,'.-',lw=0.5); plt.gca().set_xlabel('Singular Value #'); plt.gca().set_ylabel('Singular Value');

../_images/4ed08ad431f573ee619c0c731e70984368b9204c3b2a7fd377223776d1b2161b.png

That means we have 5 singular-values, as expected. Therefore, if split M into columns that refer to non-reference species, $M_{sp}$ and those that refer to reference species, $M_{ref}$, we end up with an invertible matrix. $$ E_{f}=M_{sp}^{+}\times\left(\Delta E_{r}-M_{ref}\times E_{ref}\right)$$ Which, in this parameterized version, allows for a direct relationship between formation energies and reference energies values.

Using this representation to solve for formation energies given the same reference species as in the previous section, we have.

dEf        = np.zeros(len(Ma_))                                          # allocate space for formation energies
dEf[mcols] = np.linalg.pinv(M[:,mcols]).dot(dEr-np.dot(M[:,mref],Eref))  # solving linear system through SVD

And below we can see that the formation energies computed by either methods are the same.

plt.figure(dpi=100)
plt.plot(dEf,dEf_a,'.'); plt.gca().set_xlabel('$\Delta E_{f}$ from $\Delta E_{rxn}$'); 
plt.gca().set_ylabel('$\Delta E_{f}$ from $\Delta E_{DFT}$');plt.gca().set_ylabel('$\Delta E_{f}$ from $\Delta E_{rxn}$ (Grabow)'); 
plt.plot([-3.75,2.5],[-3.75,2.5],'-',lw=0.5,c='grey',alpha=0.5)
plt.gca().set_xbound([-3.75,2.5]); plt.gca().set_ybound([-3.75,2.5])

../_images/b30342aeb31e5f38da3f5d3a5abb79d8783de7a87138137bf0558fe70eb92bd3.png

Comparing Grabow’s vs. DFT energies#

Now we can use the same rereferencing scheme, starting from reaction energies on Grabow’s paper data.

dEr_grbw = raw_data['dE'].values # Grabow's reaction energies

dEf_grbw        = np.zeros(len(Ma_))                                          # allocate space for formation energies
dEf_grbw[mcols] = np.linalg.pinv(M[:,mcols]).dot(dEr_grbw-np.dot(M[:,mref],Eref))  # solving linear system through SVD

plt.figure(dpi=100)
plt.plot(dEf,dEf_grbw,'.'); plt.gca().set_ylabel('$\Delta E_{f}$ from $\Delta E_{rxn}$ (Grabow)'); 
plt.plot([-3.75,2.5],[-3.75,2.5],'-',lw=0.5,c='grey',alpha=0.5)
plt.gca().set_xbound([-3.75,2.5]); plt.gca().set_ybound([-3.75,2.5])
plt.gca().set_xlabel('$\Delta E_{f}$ from $\Delta E_{rxn}$ (own work)');

../_images/7857336ebbf809726bf6e8d37d100c2ae2fdd05206f7fc9faf1095c2a482de7c.png

There is large variation for larger moleculecules, lower end in formation energies, which refer to methyl-formate, longer chain molecules. Such discrepancies might be either due to wrong final geometries, not accounting for long-range interaction (vdW) or even due to corrections that might have been applied by Grabow and were not included in this analysis.

plt.figure(dpi=100)
plt.hist(dEf-dEf_grbw,bins=16); plt.gca().set_xlabel('$\Delta\Delta E_{f}$ from $\Delta E_{rxn}$ (own-Grabow)'); 

../_images/a444e209a88b915299888fe40357158998f3bc48c73e08ad1dda8844917d0995.png