Memory usage with DFT opt + Solvation

adrea.snow · May 13, 2023, 5:14am

Hi all,
I’m trying to perform some DFT optimisations with SMD + IEFPCM and am running into memory issues when using hybrid DFAs (specifically CAM-B3LYP and wB97X-D3, since the jobs will be extended to TDDFT, and these DFAs are the most appropriate).

$rem
    MEM_STATIC             65536
    MEM_TOTAL              262144
    GUI                    2
    JOBTYPE                OPT
    METHOD                 wb97x-d3
    BASIS                  aug-cc-pvdz
    XC_GRID                3
    SOLVENT_METHOD         SMD
$end

$smx
    solvent                hexane
    print                  2
$end

$pcm
    Theory                 IEFPCM
$end

My testing molecule is azulene (302 basis functions in this basis set), and I’m giving the job 256GB of memory and 16 cores, but the job always errors with:

Calculating analytic gradient of the SCF energy

Not enough memory for j_fock_nuc_grad_batch::compute_pkd()
Q-Chem fatal error occurred in module progman/main.C, line 185:
Problem running Q-Chem: egrad_rs_k::compute(): Error found!
Please submit a crash report at q-chem.com/reporter

I’ve tried increasing MEM_STATIC, INCORE_INTS_BUFFER, and INTEGRALS_BUFFER without any success, and while RIJK helps, RI is not implemented in PCM yet.

The error also persists without any solvent model, and even with molecules as small as benzene.

Is there another memory variable I’m missing here? when performing the same calculating in ORCA (with CPCM instead of IEFPCM), the memory requirements for this job are closer to 32GB total, even when explicitly switching off all RI approximations.

Thank you,
Adrea

jherbert · May 13, 2023, 6:17am

MEM_TOTAL is really the only memory variable that you need for this, but are you sure this is related to PCM? Does the problem go away with SOLVENT_METHOD = 0? To me, it looks like an SCF memory problem not a PCM memory problem.

adrea.snow · May 13, 2023, 6:20am

I don’t believe it’s a PCM issue at all, since the problem persists without any solvent model, though I’m not sure how I can get past the problem without the use of RI approximations since they’re not supported by PCM yet.

jherbert · May 13, 2023, 7:40am

You should not need RI for 300 basis function DFT. I believe this may be related to a bug whereby the memory requirement for the DFT quadrature grid is calculated incorrectly; please contact Q-Chem support. In the meantime the workaround is to use a smaller grid. The default for that functional is SG-2 and (speaking as the person who set the default), it’s borderline whether that’s required in this case; even SG-1 is probably fine.

adrea.snow · May 13, 2023, 7:44am

I’ll definitely send them an email in that case. I have tried SG-1 though and it was still erroring out.

btw, thank you so much for your review on solvation models. It’s become my go-to, and the inspiration for my phd

jherbert · May 14, 2023, 5:10am

Thanks for saying so. There is a lot of jargon in that field (even by normal QC standards), and I was trying to clarify some of it because most papers don’t discuss how their favorite method relates to others in the same ecosystem. (That problem is more general than PCM…)

Regarding your memory problem, I feel this must be some kind of bug as I just tried this calculations for ~700 basis functions without such a problem, using latest Q-Chem trunk. Some tips:
(a) use default grid or even SG-1
(b) I would use C-PCM rather than IEF-PCM. People say that the latter is “better” for low-dielectric solvents, and on paper there’s a bit of truth to that but in practice the difference is well within the error bars of the continuum approach. C-PCM is simpler, thus faster and (more importantly) implemented with more other features as compared to IEF-PCM.

adrea.snow · May 14, 2023, 6:11am

Thanks for the follow up. The field really is a mess, especially for excited state solvation, which is where I’m trying to make my mark. The overlapping and overstating method names made it extremely difficult to explore all the available methods. (It’s a fantastic method, but whoever named VEM really wasn’t thinking about online searchability )

I sent an email through to qchem support as well, but it looks like the bug must have already been fixed (I’m on 6.0.2).

I was trying to run these calculations to quantify how much of an influence on geometry IEFPCM based SMD has over CPCM based. I figured it’s a decent check since most of the solvents I’m working with at the moment are all ε<10 and I’ve been using SMD in ORCA which is only CPCM. As it turns out though, IEFPCM (or SSVPE) based SMD seems to have other issues in qchem 6.0.2 anyway, so I’m comparing pure CPCM to IEFPCM instead.

The simple fix I found around the memory issues is just to allocate MEM_TOTAL = 100TB and set reasonable limits within slurm, which seems to be working.

My supervisor has the viewpoint of “always use the largest grid you can afford until you know for sure that it’s not going to influence your calculations” so I know she’d never be happy with SG-1.

jherbert · May 14, 2023, 6:43pm

A couple of things:
(a) I don’t think there’s any sense in using a grid that’s excessively large. Q-Chem defaults are well tested are sensible unless you are using exotic custom basis sets (e.g., with extremely diffuse functions).
(b) Are you sure that IEF-PCM is implemented for use with SMD? What does the manual say? I don’t recall that this was implemented, perhaps no one put a backstop in the code.
(c) If you want to see the difference between IEF-PCM and C-PCM, just use SOLVENT_METHOD = TRUE and compare. That will show you the difference in electrostatic energies (G_elst in my notation, or what the SMx papers call ENP). To that, you can add the non-electrostatic part (G_nonelst or CDS) from a SMD calculation with C-PCM electrostatics.

adrea.snow · May 15, 2023, 12:22am

I inferred the possibility of IEF-PCM for SMD from section 11.2.9.3 of the manual:

" An SMD energy or gradient calculation is requested by setting SOLVENT_METHOD = SMD in the $rem section. While Q-Chem users can vary the parameters for the IEF-PCM part of the SMD calculation, this should be done with caution because a modified IEF-PCM electrostatics might be less compatible with CDS parameters and thus lead to less accurate results."

I’m more specifically interested in whether or not the underlying PCM theory has a significant change on geometries at low vs high dielectric solvents. I’ve settled on comparing IEF-PCM and CPCM solvated opts for my dataset on n-henxane and dmso, and seeing whether or not the CPCM/IEF-PCM energetic and geometric RMSD differences are drastically different between the two solvents.

While I’d like to be using IEF-PCM-SMD for the most robust model for low dielectric solvents, I’m currently trying to evaluate if my CPCM-SMD optimisations from ORCA (That I’ve spent months calculating) will be giving reasonable structures to then follow up with single points in Q-Chem with more advanced solvent models for the final energetics.

jherbert · May 15, 2023, 7:09pm

Thanks, wouldn’t be the first feature I’ve forgotten we have, and knowing something about the internal structure of the code I can’t imagine why that wouldn’t be in place. My very strong intuition is that this won’t make any difference for geometries. That’s based on a couple of things, (a) geometries are much less sensitive to level of theory in general, as compared to relative energies; and (b) IEF-PCM is simply not that different of a model from C-PCM, even in low-dielectric solvents. If it were my student, I would tell them not to bother with IEF-PCM (despite it being my group who implemented it).

That said, since SMD with IEF-PCM is an advertised feature, if it’s not working then that’s a bug that we should address. If you can post a complete input (or email it to me if you prefer), which demonstrates how this is not working, then I will create a ticket in our system. Please stipulate which version of Q-Chem you are using.

adrea.snow · May 15, 2023, 11:47pm

I’ve been running a test and my simple results are showing energy differences of up to 0.12 eV (MSE 0.042 eV) in the ground state for rigid fluorophores as optimised in pure (no SMD) CPCM vs IEFPCM with n-hexane. Geometric differences are up to 0.2 Å RMSD (MSE 0.005Å). I’ll have to do further testing on TD-DFT excitations to see how they translate to vertical excitations, but for the dataset building work I’d definitely like to quantify the influence of the underlying theory.

For SMD-IEF-PCM opt gradient issues running in Q-Chem 6.0.2:

32GB of memory was allocated in SLURM, but MEM_TOTAL = 99999999 is required to get past the grid memory estimator bug.

Input:

$molecule
0 1
  C   1.91557601595939     -1.26642442314919     -0.00002197149021
  C   2.50467931451576     -0.00001282459289     -0.00000061017151
  C   1.91558242618290      1.26640538602792      0.00002223613492
  C   0.55992253917947      1.59531044114964      0.00004298922194
  C   -0.54432921773595      0.74592313330616      0.00003380201587
  C   -1.89124751341000      1.14936312290632      0.00005979166168
  C   -2.69657815138483      0.00000241596941      0.00000943205409
  C   -1.89125439269356     -1.14935957104026     -0.00004385810299
  C   -0.54433273572621     -0.74592261681773     -0.00002285953999
  C   0.55991044157298     -1.59531708651117     -0.00003725534593
  H   2.60894126878788     -2.10950888793139     -0.00003365780391
  H   3.59767753567135     -0.00001614949140     -0.00000213695058
  H   2.60895462892106      2.10948468937385      0.00003213870190
  H   0.32792261215946      2.66414377977702      0.00007235359050
  H   -2.23640033232671      2.18144296835914      0.00010113085119
  H   -3.78653653225096      0.00000660740139      0.00001090707570
  H   -2.23640810726794     -2.18143840397550     -0.00008438628012
  H   0.32790257704588     -2.66414906006132     -0.00006538712253
$end

$rem
    MEM_TOTAL              99999999
    GUI                    2
    JOBTYPE                OPT
    METHOD                 wb97x-d3
    BASIS                  aug-cc-pvdz
    XC_GRID                2
    SOLVENT_METHOD         SMD
$end

$smx
    solvent                hexane
$end

$pcm
    Theory                 IEFPCM
$end

I’ve removed the bulk of the gradient matrices for simplicity, but here is the error:

Calculating analytic gradient of the SCF energy
 Gradient of CDS energy
            1           2           3           4           5           6
    1   0.0001528   0.0002356   0.0001528  -0.0002569   0.0001126   0.0001166
...
 Gradient of SCF Energy
            1           2           3           4           5           6
    1   0.0000947  -0.0001310   0.0000935   0.0006050  -0.0003928  -0.0013416
...
 Max gradient component =       3.444E-03
 RMS gradient           =       9.238E-04
 Computing fast IEFPCM-SWIG gradient.

 Q-Chem fatal error occurred in module libmdc/newfileman.C, line 376:

 FileMan error: End of file reached prematurely reading (10512) bytes in file FILE_PCM_ELEC_ESP
 Path: /scratch/sn29/asnow/qchem/qchem33835/839.0


 Please submit a crash report at q-chem.com/reporter

jherbert · May 16, 2023, 12:46am

(1) I can reproduce this crash with Q-Chem trunk (pre-release v. 6.1.0), so it seems there is a bug in SMD(IEF-PCM) optimizations. I will make a ticket.

(2) Regarding the memory settings, Q-Chem doesn’t know anything about SLURM (the latter is a request to the batch scheduler and the OS, not the software to be run). For a large-memory job, you therefore need to request large memory in both places, SLURM file and Q-Chem input. That said, the fact that this is a large-memory job does feel like a bug, as I’ve said from the beginning. I do recall there was a semi-recent bug where the memory required for DFT quadrature was overestimated by Q-Chem (leading to much larger memory requirements than what one would estimate “on paper”), wasn’t clear if we’d shipped a version with that bug or if it just came up in internal development but could have been the former. Seems to have been fixed because if I comment out MEM_TOTAL in your input (thus run with default memory request), it crashes in the same way, at the first gradient as in (1).

(3) Regarding geometry differences between IEF-PCM and C-PCM, I remain skeptical but please, test away, feel free to prove me wrong. Your very small mean error feels right, perhaps the outliers are floppy side chains whose absolute position is not well-determined? I also feel that 0.1 eV is well within the margin of error for these methods. Those kind of differences may be meaningful under very controlled conditions (substituting one functional group for another, or changes along a potential surface), but please keep in mind that you have made a variety of more-or-less arbitrary choices that could easily contribute differences on the order of 0.1 eV or more. For example, what if you used a triple-zeta basis set instead? Or, you are constructing a cavity using Bondi radii scale by 1.2, and there’s nothing magic about exactly 1.2, what if it were 1.1 or 1.3? What if you used a different (but still reasonable for TDDFT) functional? Any one of those things probably changes vertical excitation energies by 0.1 eV or more.

Not trying to discourage you, just providing friendly free advice. Basis set, functional, and cavity construction in your example are all reasonable choices for TDDFT (although I probably would have used Pople basis, for efficiency), but none of those choices is definitive. That uncertainty in the nature of the beast for electronic spectroscopy in complex systems.

adrea.snow · May 16, 2023, 1:32am

(1) Perfect, and thank you

(2) Oh yeah, this is just to bypass the memory bug in 6.0.2, but I wanted to clarify that when providing my input file. Actually memory usage as reported by time -v is only about 2GB

(3) Thank you for the pointers, and I completely agree.
My dataset will ultimately be experimental; a matrix of computationally studyable fluorophores in a series of solvents with mixed properties, to get a systematic series of useful observables. Absorbance, emission and excitation spectra (done to a neurotic level of error reduction), and TCSPC fluorescence decays will all be published and open for computational people to refer back to - https://github.com/adreasnow/fluorophore-data

I’m very open to suggestions on this work btw

In subsequent analysis I’m going to be experimenting with solvent models to figure out what systematic errors we can potentially try and identify that might otherwise be missing and see if we can improve solvent models for excited states beyond that of VEM and SMSSP

jherbert · May 16, 2023, 9:20am

I think have a fix for the SMD(IEF-PCM) bug. Need to run the full battery of tests to make sure it doesn’t break something else, but I probably can (just) squeeze it into the upcoming 6.1 release in a few weeks. Unfortunately, the nature of the bug is such that there’s no workaround for older versions.

adrea.snow · May 17, 2023, 12:16am

Perfect, thank you

jherbert · May 24, 2023, 10:44am

Just to follow up, the gradient for SMD with IEF-PCM electrostatics has been fixed for the v. 6.1 release.

adrea.snow · May 28, 2023, 2:53pm

I just realised that this repo was set to private, but it should be accessible now