Different (larger) SCF energy compared to other software (Orca, C4) when using an augmented (aug-cc-pVDZ) basis set

Hello,

I have encountered the following discrepancy in the total SCF energy between QChem and other software (Orca, C4) when an augmented (aug-cc-pVDZ) basis set is used. The QChem SCF energy of a molecule is:

-917.30825055

the SCF energy for the same geometry and basis set obtained with Orca is:

-917.30830690

The difference is too large to be due to numerical accuracy. Tightening the S2THRESH and THRESH thresholds does not help …

This is not the case when using a basis set without diffuse functions, for instance, for the cc-pVDZ basis, the QChem SCF energy is consistent with that obtained with different software:
-917.25048977 (QChem), -917.25048979 (Orca)

Best regards,
Evgeniy

Here is the QChem input:

$molecule
-1 1
F -2.47368223281418 2.80542728052264 0.00000000000000
C -2.33005485594229 1.46862084047293 0.00000000000000
C -3.56162073868157 0.71406902149433 0.00000000000000
O -4.68525614542407 1.22001050590180 0.00000000000000
C -1.08908790429276 0.93919035038310 0.00000000000000
C -0.91128327824508 -0.46521382888149 0.00000000000000
C -2.07888797991049 -1.26202545035880 0.00000000000000
C -3.30795039619975 -0.70334017621046 0.00000000000000
F -4.39365576088541 -1.49857540790821 0.00000000000000
C 0.33862192851016 -1.09353980235356 0.00000000000000
C 1.60703790100785 -0.58431378150168 0.00000000000000
C 2.78133961487714 -1.42752738981520 0.00000000000000
O 2.94623439646805 -2.63951617199486 0.00000000000000
N 3.82514570126253 -0.50115670822928 0.00000000000000
C 5.20318781217063 -0.87463424735375 0.00000000000000
C 3.27753514370613 0.76211143898986 0.00000000000000
N 1.99312647467926 0.75926371636068 0.00000000000000
C 4.13050417265683 1.97085910160822 0.00000000000000
H -0.22496779993189 1.58459926313070 0.00000000000000
H -1.99511075817576 -2.34012061308032 0.00000000000000
H 5.72271910839518 -0.50956263241161 -0.88515098609478
H 5.22376982390643 -1.96127271629799 0.00000000000000
H 5.72271910839518 -0.50956263241161 0.88515098609478
H 3.48806213050252 2.84402933313660 0.00000000000000
H 4.77482845506263 2.00286559198461 -0.87854761232939
H 4.77482845506263 2.00286559198461 0.87854761232939
H 0.32565941918173 -2.17848932246953 0.00000000000000
$end

$rem
MEM_TOTAL = 10000
MEM_STATIC = 2000

symmetry true

s2thresh 20
thresh = 20
scf_convergence = 8

method = hf

basis = aug-cc-pVDZ
basis2 = cc-pVDZ
scf_final_print = 5
$end

Please check that PURECART is the same in both codes (pure vs. cartesian spherical harmonics) - is it the same number of basis functions? Also check the contraction coefficients as some of the Dunning basis sets have multiple contraction schemes that are in use. With Q-Chem you can use PRINT_GENERAL_BASIS = TRUE to see what these are. Finally, check whether any functions are removed based on the linear dependency tests.

Also note that values of THRESH and S2THRESH > 16 are meaningless (they go to 16), because double precision gives you only about 15.5 decimal digits of precision.

Many thanks for your response! The number of basis functions is the same (correct). If some basis function(s) is (are) removed based on the linear dependancy tests, is there any notification about that in the output?

After tightening the BASIS_LIN_DEP_THRESH = 20, I got nearly the same SCF energy as in other programs, -917.30830575 (QChem) vs -917.30830690 (Orca) …

Kind regards,
Evgeniy

The bottom line is that I don’t think you should be concerned about this discrepancy. You want to remove the linear dependencies because they can seriously hamper SCF convergence for medium-sized molecules, especially when diffuse functions are present. In Q-Chem, this is indicated by a statement along the lines of

 Smallest overlap matrix eigenvalue = 9.21E-07
 Linear dependence detected in AO basis
 Number of orthogonalized atomic orbitals = 494

This is for BASIS_LIN_DEP_THRESH = 6 (meaning 1e-6, which is the default). The original calculation had 495 basis functions so exactly 1 is being removed. That’s not too bad, probably wouldn’t have hampered convergence much, but I’ve seen worse cases in larger molecules where disabling this feature (by setting BASIS_LIN_DEP_THRESH = 20) is a bad idea.

Many thanks! I see your point. Yes, there is a linear dependence due to diffuse fuctions, but it is not so severe, SCF can converge without problems. And I just wanted to know the reason of that descrepency. :slight_smile: And to get rid of linear dependences, something like aug-ano-cc-pVDZ could be used, I guess.

(1) The default threshold to remove linear dependent MOs in Q-Chem is 1e-6. This threshold is also adopted by Gaussian and GAMESS, but not by ORCA. You can add sthresh 1e-6 in ORCA .inp file (where the default value in ORCA is 1e-7). My suggestion is always using 1e-6 in ORCA, and not to use 1e-7. It happens that there is only one linear dependent MO in the current problem, and it seems the RHF calculation is not affected much. But if you want to perform further calculations like post-HF methods, I’ll still suggest you set 1e-6 in ORCA.

(2) There is no need to run an independent RHF calculation in ORCA. In that way, you spent much time to do trial and error, and the computational cost of SCF iterations. You can obtain the converged RHF solution in ORCA immediately, without any trial and error.

To achieve this, we can simply transform/convert MOs from Q-Chem to ORCA. We can add the keyword gui = 2 into the Q-Chem input file (.in), e.g.

$rem
method = hf
mem_total = 10000
thresh = 12
scf_convergence = 8
basis = aug-cc-pVDZ
basis2 = cc-pVDZ
sym_ignore true
gui = 2
$end

After the Q-Chem job is accomplished, we obtain the file test.FChk (maybe test.fch for newer versions of Q-Chem). Now we run the following Python script

from mokit.lib.qchem import standardize_fch
import os

old_fch = 'test.FChk'
std_fch = 'test_std.fch'

standardize_fch(old_fch) # generate std_fch
os.system('fch2mkl '+std_fch)
os.system('orca_2mkl test_std_o -gbw')

to generate ORCA input file test_std_o.inp and and binary wave function file test_std_o.gbw. If we submit them to ORCA, we’ll find the SCF is converged in 1-2 cycles

----------------------------------------D-I-I-S--------------------------------------------
Iteration    Energy (Eh)           Delta-E    RMSDP     MaxDP     DIISErr   Damp  Time(sec)
-------------------------------------------------------------------------------------------
               ***  Starting incremental Fock matrix formation  ***
                              *** Initializing SOSCF ***
---------------------------------------S-O-S-C-F--------------------------------------
Iteration    Energy (Eh)           Delta-E    RMSDP     MaxDP     MaxGrad    Time(sec)
--------------------------------------------------------------------------------------
    1    -917.3082517096822812     0.00e+00  5.64e-05  4.28e-04  3.98e-08    10.2
               *** Restarting incremental Fock matrix formation ***
    2    -917.3082517097236632    -4.14e-11  9.59e-06  7.30e-05  7.15e-08     9.5
                  *** Gradient check signals convergence ***

               *****************************************************
               *                     SUCCESS                       *
               *           SCF CONVERGED AFTER   2 CYCLES          *
               *****************************************************

which saves the time for trial and error, and the SCF iterations in ORCA.

Thanks for your help. I agree that the default linear dependency threshold in ORCA is too tight and that has caused us problems with SCF convergence in that code, in the presence of diffuse functions. Probably there’s a mismatch with other thresholds, which in principle ought to be the square of the smallest eigenvalue of the overlap matrix. Q-Chem does print a warning to this effect, which most users ignore (in my experience), but you ignore it at your peril sometimes.

Thank you @jherbert and @jxzou-MOKIT for your replies! Linear dependence of the basis functions (LD) is a know problem in quantum chemistry. Well, if it is just one basis function that causes the LD, then it is porbably worth removing it to get rid of the problem. :slight_smile:
Thank you @jxzou-MOKIT for the trick with converting orbitals from the QChem to Orca format. That is useful :slight_smile: