Computational effort for solvent models

jimkress · November 26, 2022, 3:17am

How do the implicit solvation models in Q-Chem rank in terms of computational effort?

jherbert · November 26, 2022, 2:59pm

Of the ones a person might want to use (and here I am discounting Kirkwood-Onsager multipolar model because it uses a spherical cavity), all of them are formally cubic-scaling because they require solution of a set of linear equations. That said, its O(N^3) with a very small prefactor so for small molecules you won’t even notice it, for medium-size molecules it should still be a small fraction of the SCF time per cycle.

Large molecules can be an issue. For PCMs, there is a conjugate-gradients solver for the linear equations that will kick in for a sufficiently large set of discretization points (or can be turned on at any time using keywords in $pcm section. There are low-memory linear-scaling features implemented for that but I won’t claim that the default settings are well optimized because this feature isn’t very widely used. For large molecules you may want to reduce the number of discretization points (HPOINTS and HEAVYPOINTS keywords in $pcm), as the defaults are intended to ensure ~1 kcal/mol convergence of solvation energy but for large molecules one is probably not interested in absolute solvation energy but rather in conformational dependence or maybe just trying to use something other than vacuum boundary conditions.

In trying to use SMx models for proteins (at semi-empirical HF-3c level of theory), we encountered some hard-coded size limits. I eliminated some of those (maybe starting with Q-Chem 5.4?), but there remains a fairly sizable memory bottleneck that I cannot fix without a major rewrite of the old Cramer/Truhlar fortran code, which I am not going to do.

Lastly there is the isodensity implementation of SS(V)PE and with it the CMIRS model. That is not intended (in its present form) for large molecules because the isodensity cavity construction algorithm that is available will fail if the molecule has a complicated shape. (“Complicated” is explained in the manual.)

As always, I will put in a plug for my review article on solvation models, where all of these issues are addressed: https://doi.org/10.1002/wcms.1519

jimkress · November 26, 2022, 6:39pm

Thanks John. I appreciate your reply.

I re-read your review article and re-watched the webinar you gave. While they were a good explanation of the methods and their application, I was looking for more info on which methods were most computationally demanding and which were not.

What size limit does one encounter wrt “fairly sizable memory bottleneck” with SMx? What is the lower limit on HeavyPoints and HPoints to retain meaningful SCF energy accuracy for PES examination?

For a DNA oligomer with 506 atoms, 2512 shells and 5484 basis functions, using:

$pcm
Theory CPCM
Method SWIG
Solver Inversion
HeavyPoints 194
HPoints 194
Radii Bondi
vdwScale 1.2
$end

$solvent
Dielectric 78.39
$end

It takes 15 wall hours to get SCF convergence for a single point.

I’m going to try SMD and see what that does.

Thanks again.

Jim

jherbert · November 26, 2022, 7:44pm

(1) SMD uses PCM electrostatics (combined with SMx-style nonelectrostatics). Performance keywords for PCM therefore apply here, unlike SM8 and SM12 that use Generalized Born electrostatics and do not use the PCM code at all.

(2) I think I convinced myself once that 110 Lebedev points for sphere was sufficient for large molecules, but your mileage may vary. It certainly won’t be converged to the limit with N=110, see small-molecule convergence tests here: Cookie Absent and here: Cookie Absent. For large systems, however, you have to balance that against the rather limited level of SCF theory (including small basis sets), and I don’t think it’s necessary or advisable to push the PCM discretization grid to the limit while those other approximations remain significant.

(3) From comments that I left in the code, it looks like the remaining memory bottleneck is 3 x NAtoms x NB2 (in units of double precision numbers, 8 bytes per double), where NB2 is the number of significant function pairs. (I eliminated a twice as large memory bottleneck in July 2021, so this would have appeared in 5.4.2, but the one that remains is structural - would require a major redesign of code that I’m only vaguely familiar with and that no one is developing, to the best of my knowledge.) That bottleneck is fairly large for big systems, you can see a printout near the beginning of a Q-Chem job, a few lines after the coordinates, in a line that reads “There are XYZ function pairs”. There’s no way to reduce this, it’s fixed by the number of atoms and the basis set, feels like a design choice made for convenience under an assumption that the code would only be used for small molecules.