I am trying to understand how QChem figures out the optimal number of basis functions to use for a given atom, given the type of basis function specified in the input file. For eg., if I use 6-311G(d,p), how do I know how many basis functions will it use for carbon? And how would it change for another basis set?

The source of this question is as follows. I am using the pcJ-n basis set to calculate NMR constants because that basis set is built specifically for those type of calculations. However, it makes the calculation EXTREMELY slow and memory intensive. From the paper on the development of the pcJ-n basis set, I see that the advantage it has over other basis sets is to require a lesser number of basis functions to achieve convergence. But, if this causing memory issues (which it is), can I not perhaps increase the number of basis functions QChem uses for, let’s say, pcseg-n, and get away without using pcJ-n?

The number of basis functions for each atom is unambiguously determined by the basis set itself. Each s-type shell provides 1 basis function, p-type shell provides 3 basis functions, etc. For any shell with angular momentum L, the number of functions is 2L+1 for pure and (L+1)(L+2)/2 for Cartesian representation. One can compute the number of basis functions for a particular basis set by obtaining the shell structure for a particular basis and element from Basis Set Exchange (BSE).

If you post here an input that’s particularly slow, it may be possible to provide more specific suggestions for speeding up the calculation.

The pcJ-n basis sets are particularly large but they are developed in part for NMR calculations that are known to be particularly demanding on basis sets. You can have a look in the literature for “locally dense” basis sets that are sometimes used in NMR calculations. The idea is that you would use a high-quality basis set for the nuclei for which you want NMR parameters and a lower-quality basis set for the remaining nuclei. This can be implemented in Q-Chem using BASIS=GEN and a $basis section. You will need to look in the literature for what “lower quality” basis sets are recommended. This also means that you may need to do more than one calculation to get all the NMR parameters that you want (changing which atoms get the high-quality basis set), but if it reduces the cost enough this may still be a net win in computer time.

If you are interested in P-P spin-spin couplings only, your basis choice appears reasonable. If you have Q-Chem 5.4, not using BASIS2 will actually make SCF run faster with automatically generated SAD guess.

When I used an apatite unit cell in place of your $molecule, the performance was as expected for a system of this size. Could you be more specific about the slow performance you are observing? Is it SCF or ISSC CPSCF that is your main concern?

I see. Actually to be honest, my main concern is not the slowness of the simulation but the amount of memory it takes. Almost every time the simulation runs out of memory and I am unable to get an answer using the pcJ-n basis set. I have answers from cc-pVQZ, but similarly, using cc-PV5Z also gives me memory issues within QChem itself. I posted a question about this very problem earlier (NMR calculation crashes with finer basis set).

But from what I gather from your comments, it seems like that is to be expected?

I posted a response to your cc-pV5Z question at the time. That one is not a memory issue but rather reflects that Q-Chem doesn’t have high enough angular momentum integrals for this particular application with that basis set. Quadruple-zeta worked fine.

I also do not understand your complaint about memory. Taking the molecular structure from your previous query and the rest of the input from this one, I tried the following input file:

$molecule
0 1
Ca
Ca 1 3.408
Ca 1 3.397 2 60.008
P 3 2.946 1 54.819 2 294.1
O 4 1.498 3 138.358 1 217.5
O 4 1.637 3 50.386 1 305.8
O 4 1.637 3 50.165 1 129.5
O 4 1.637 3 104.129 1 37.8
P 3 2.946 1 54.819 2 65.9
O 9 1.498 3 138.358 1 142.5
O 9 1.637 3 50.386 1 54.2
O 9 1.637 3 50.165 1 230.5
O 9 1.637 3 104.129 1 322.2
$end
$rem
JOBTYPE ISSC
EXCHANGE B3LYP
BASIS gen
BASIS2 6-311G(d,p)
MEM_STATIC 8000
mem_total 80000
MAX_SCF_CYCLES 500
SCF_CONVERGENCE 6
$end
$basis
Ca 0
pcseg-1
****
O 0
pcseg-1
****
P 0
pcj-3
****
$end

(I don’t know that these particular memory settings are required but they’re not too excessive for a modern workstation or compute node on a cluster.) On my system, this job runs to completion in 240 sec on 40 cores.