Best method for large systems

jimkress · February 15, 2025, 8:37pm

I am currently trying to do DFT calculations on some very large systems (thousands of atoms, tens of thousands of basis functions). The Q-Chem manual is a bit confusing in its discussion of methods for large systems like these. What is the recommended methodology for performing DFT calculations on such systems?

Specifically, I am doing:

2167 atoms (a DNA double helix ion neutralized with Na+ atoms)
There are 5586 alpha and 5586 beta electrons
Requested basis set is def2-SVP
There are 10938 shells and 23890 basis functions
14247 external point charges
METHOD wB97M-V

Additional info:

System is RHEL 8.10
2TB RAM
Dual AMD EPYC 7H12 64-Core Processor
5 TB SSD
4 RTX 3090 GPUs
Q-Chem 6.2.1 for Intel X86 EM64T Linux
CUDA 11.8

Suggestions would be appreciated.

jherbert · February 16, 2025, 4:06pm

In terms of hardware that’s probably pretty close to optimal for maximizing DFT performance using Q-Chem. In terms of method choice, that depends on what you want.

For functional, wB97M-V is, statistically speaking, arguably the best all-around functional right now so that’s a good choice for final energetics calculations.
https://doi.org/10.1080/00268976.2017.1333644
However, it’s a meta-GGA and also B97-based and both those aspects make it more expensive than some other choices (and also more sensitive to basis set). If you can afford it, I would recommend wB97M-V/def2-TZVP as your final energetics level of theory (e.g., to get reaction energies or barrier heights). For structure optimizations, however, wB97M-V is probably not needed and I would go with something like wB97X-D/def2-SV(P), or even wB97X-D/6-31G* unless you have atoms that aren’t supported by the latter. That will be a lot faster and probably just as good for geometries. For large systems, be careful with thresholds as the Q-Chem defaults are a bit too loose in my experience.
https://doi.org/10.1021/acs.jpca.4c00283
The Q-Chem office is working on this issue for the v. 6.3 release but in the meantime, my group typically sets THRESH=12 for large systems. For optimizations Q-Chem defaults to THRESH=11 which isn’t too bad but for single-point calculations I think the default is too loose, it can actually slow the calculation down (relative to tighter thresholds) by requiring more SCF cycles.

jimkress · February 16, 2025, 11:18pm

Thanks John,

The objective of our research is the understanding of downrange, nonlocal effects on gene activation of proteins binding to DNA. We are doing single point calculations on snapshots of MD trajectories. We are not doing structure optimizations with Q-Chem.

It is assumed that the -V variant of the functional evaluated at each step of the SCF so that it provides a molecular density that reflects the nonlocal effects, as opposed to the D3 approach which is purely additive to the energy at the final converged energy.

Is there a different/ better way to have the nonlocal effects of the system included in our calculations using a less demanding functional and basis set? In my reading it appears that the -V variant is referred but Q-Chem only supports it for the wB97M-V functional and dev-2 SVP is the lowest level of basis set that provides a reasonable answer.

Thanks for your suggestions.

The TZVP basis set would explode the magnitude of the calculation beyond our ability to perform. So that is not really an option at this time. We haven’t yet reached the largest system that needs examination.

As soon as our current calculation completes (1.5 days and running) we’ll try the THRESH=12 as you suggested.

Thanks again.

jherbert · February 17, 2025, 12:05am

The -V indicates nonlocal VV10 correlation, which is mostly a correction for dispersion, other parts of that functional remain semilocal (except for Hartree-Fock exchange). So any nonlocality that you see is likely to be extremely short-range in real space, decaying nominally as R^(-6). Same goes for wB97X-V. That’s a hybrid GGA whereas wB97M-V is a hybrid meta-GGA. The latter is slightly more accurate overall (but also slightly more expensive).

jimkress · February 17, 2025, 10:31pm

Thanks again John,

I appreciate your summary. However, I am still rooting through the QChem6.2 manual to find a preferred method for dealing with large systems like mine. In particular, I find the following:

The user manual for Q-Chem version 6.2 provides several methods for handling large molecules in computational chemistry:

Parallelization:

Shared and distributed memory parallel jobs using the Cyclops Tensor Framework (CTF).

Memory Keywords:

MEM_STATIC, CC_MEMORY, and CC_BACKEND for specifying memory usage.

Single-Precision Arithmetic:

Reduces memory footprint and execution time, controlled by the CC_SINGLE_PREC keyword.

Q-CHEM Methods:

Continuous Fast Multipole Method (CFMM) for electron-electron Coulomb interactions. 
Linear Scaling Exchange (LinK) for exact exchange interactions. 
Incremental and Variable Thresh Fock Matrix Building. 
Fourier Transform Coulomb Method (FTC). 
Resolution of the Identity Fock Matrix Methods (RI-JK). 
PARI-K Fast Exchange Algorithm. 
occ-RI-K Exchange Algorithm. 

Additional Methods for Large Molecules:

Projection-Based Embedding. 
Approximate EOM-CC Methods. 
Single-Precision Arithmetic in EOM-CC and EOM-MP2 calculations. 
RI Approximation and Cholesky Decomposition.

Is there a preferred method or methods to use on a large system like mine? It’s not clear from the manual. It presents a number of possible things to try but there are no suggestion of evidence or tradeoffs.

Thanks for your help.

Jim

jherbert · February 18, 2025, 1:32am

The most relevant option in that list is RI-JK, for which there has been significant improvement in recent versions of Q-Chem. I recommend trying that and if it doesn’t work for you (either fails or doesn’t result in significant speed-up), post it here or – better yet given the size of your jobs – get in touch with Q-Chem customer support. I think they might be interested to have some real-world use cases as this is a functionality that we are actively trying to improve.

jimkress · February 19, 2025, 6:37am

Thanks John,

I tried your suggestions on two systems called smaller and larger. The RIJK results for the smaller system take more time than that without the RIJK. For the larger system, the use of RIJK results in a segmentation fault.

Here is a summary of the results:

Smaller run (baseline)
    6  -14449.4308592637      5.88e-05  
 Brianqc Computing J only
BrianQC JK build time 3.80e+01 (s)
 Brianqc Computing K only
BrianQC JK build time 3.80e+01 (s)
BrianQC XC build time 9.00e+00 (s)
    7  -14449.4357684963      1.92e-05  
    8  -14449.4363240490      8.62e-06  Convergence criterion met
 ---------------------------------------
 SCF time:   CPU 24138.86s  wall 1170.00s 
 SCF   energy = -14449.43632405
 Total energy = -14449.43632405

Smaller run (baseline) - RIJK
    6  -14449.4308597573      5.88e-05  
 Brianqc Computing J only
BrianQC JK build time 4.20e+01 (s)
 Brianqc Computing K only
BrianQC JK build time 4.50e+01 (s)
BrianQC XC build time 6.00e+00 (s)
    7  -14449.4357705339      1.92e-05  
    8  -14449.4363284490      8.61e-06  Convergence criterion met
 ---------------------------------------
 SCF time:   CPU 28456.02s  wall 1219.00s 
 SCF   energy = -14449.43632845
 Total energy = -14449.43632845

Smaller run (baseline) - RIJK, THRESH=12
    6  -14449.4307069027      5.88e-05  
 Brianqc Computing J only
BrianQC JK build time 6.00e+01 (s)
 Brianqc Computing K only
BrianQC JK build time 6.20e+01 (s)
BrianQC XC build time 6.00e+00 (s)
    7  -14449.4356236142      1.92e-05  
    8  -14449.4361936557      8.61e-06  Convergence criterion met
 ---------------------------------------
 SCF time:   CPU 35339.81s  wall 1590.00s 
 SCF   energy = -14449.43619366
 Total energy = -14449.43619366

Larger run - RIJK, THRESH=12

/work/qchem6p2/bin/qchem: line 129: 1838328 Segmentation fault      (core dumped) ${QCPROG_S} ${inp} ${scr}
Error in Q-Chem run part 1
Error in the Q-Chem run

And the larger system for RHF, without RIJK and THRESH at its default

   17 -101697.3484498889      2.55e-05  
BrianQC JK build time 3.68e+03 (s)
   18 -101697.3487521224      2.46e-05  
BrianQC JK build time 3.98e+03 (s)
   19 -101697.3532386404      2.46e-05  
BrianQC JK build time 4.18e+03 (s)
   20 -101697.3532461978      2.45e-05  

It does not crash.  It just gets “stuck” after about 15 iterations.

The complete output files are available for the QChem teams use. I have sent them to support at Q-Chem.

jherbert · February 19, 2025, 10:54am

It’s possible that RI doesn’t play well with BrianQC GPU support. I don’t use the latter so I’m not sure.

jimkress · February 20, 2025, 4:55am

I’ll check with BrianQC and see what they think/

jherbert · February 20, 2025, 12:58pm

I suspect that the requisite RI integrals have not been ported to the GPUs. If so, the fastest modes of operation will be either (1) conventional DFT with GPU-accelerated integrals, which is what you have been doing, or else (2) RI-DFT on CPUs only. One or the other might be faster depending on system size, characteristics of the basis set, and probably hardware, so you would need to compare them yourself.