Segmentation fault and Aborted

Hello, I’m try to TDDFT with B3LYP/6-31g++(d,p)
When I try to TDDFT in many molecules, I’m facing lots of errors of ‘Segmentation fault (Error code 11)’ and ‘Aborted (Error code 6)’, especially ~100atoms in randomly.
I have no I idea, since there is always same error even though I increse mem_total.
There is ‘SockError: Recv error’ in error files of pbs script, but I also don’t know which part should I fix the input script to treating SockError.
I use Q-Chem 4.4.

If you know of a way to fix this, or know of anything else that you think might be wrong, please let me know.

Here is an example input files of mine:
$molecule
0 1
C 0.26249446 -0.89724243 -4.88596264
C 0.66811880 -2.19522337 -4.55092545
N 1.20656405 -2.49152621 -3.35536136
C 1.37094795 -1.50457001 -2.45624174
C 1.01613413 -0.16971274 -2.70807659
C 1.17955780 0.92716037 -1.67928255
C -0.22447702 1.17691510 -1.13368357
N -0.75880122 2.40983601 -1.30365871
C -1.98072344 2.60252964 -0.81523712
C -2.77500499 1.62545941 -0.12524365
C -4.02741307 2.30681147 0.23044284
C -5.18310593 1.79074688 0.95657422
C -5.26626285 1.93793745 2.34418739
C -6.41721802 1.48180790 2.99699432
N -7.45391375 0.92752297 2.34120554
C -7.38100266 0.81789390 1.00141975
C -6.26792449 1.24311129 0.26675064
C -4.00770391 3.59012256 -0.21256174
C -2.72153901 3.90812154 -0.91715908
C -2.21900435 0.35895555 0.02996395
C -2.82939386 -0.84027624 0.70376587
C -2.99517783 -0.59081588 2.20988569
C -3.42631266 -1.87713637 2.93164508
C -4.30909974 -2.77361144 2.04697154
C -5.09199045 -1.94484096 1.02353279
C -4.14214373 -1.27038674 0.02428024
C -0.92831802 0.15451288 -0.49244409
C 1.76391476 2.20558882 -2.33222151
C 3.23771475 2.03925299 -2.73307733
C 4.01614923 1.70179760 -1.46751712
C 5.53080145 1.49147510 -1.58033997
C 5.85090128 0.66226520 -0.31297389
C 4.57308251 -0.04959959 0.07781952
C 4.49592988 -0.91325486 1.09703623
C 5.66619715 -1.17084129 2.00225113
C 5.19292628 -1.46587496 3.43291384
C 4.32993599 -2.73594238 3.49359909
C 3.62516646 -3.01205988 2.15808161
C 3.26953907 -1.70889758 1.42728473
C 3.51281006 0.33811154 -0.91534324
C 2.06582461 0.52801523 -0.46481079
C 0.43683275 0.11621764 -3.94916407
H -0.18263805 -0.69048446 -5.85399044
H 0.55918806 -3.03243374 -5.24739105
H 1.80774299 -1.82484222 -1.50188516
H -4.45488273 2.39655647 2.90332657
H -6.52966235 1.55915425 4.08454447
H -8.25592253 0.37338146 0.51299146
H -6.24987838 1.15167807 -0.81619840
H -4.77827476 4.32933308 -0.10256257
H -2.87435633 4.21566057 -1.97119510
H -2.16498116 4.73891781 -0.43883134
H -2.12100401 -1.71174973 0.60674013
H -2.04874741 -0.21210555 2.63799182
H -3.74251913 0.20067759 2.38128550
H -2.53527107 -2.44364056 3.25839892
H -3.97083430 -1.60967278 3.85602032
H -3.68393787 -3.51841679 1.51966629
H -5.00622158 -3.35448578 2.67723445
H -5.69505315 -1.18027699 1.55080421
H -5.81615875 -2.58263869 0.48598064
H -4.63889584 -0.40719189 -0.45044883
H -3.91055285 -1.96488823 -0.80596028
H -0.46959446 -0.83024440 -0.40481137
H 1.15674015 2.51089519 -3.20294009
H 1.66420718 3.05004961 -1.61779397
H 3.34969618 1.24923219 -3.49677173
H 3.61552113 2.96714535 -3.19632290
H 3.83040570 2.49961152 -0.70768445
H 6.08764042 2.43669163 -1.60947847
H 5.79656196 0.93854825 -2.49478971
H 6.66478466 -0.05734740 -0.50312188
H 6.19522836 1.32159463 0.50385445
H 6.36351277 -0.31079714 2.02011446
H 6.25422625 -2.02675307 1.61129898
H 6.06250528 -1.56477440 4.10633167
H 4.61376021 -0.60038473 3.80815657
H 4.95444554 -3.60551754 3.76816536
H 3.58331116 -2.63568205 4.30241984
H 4.27730839 -3.62812911 1.50949898
H 2.71261034 -3.61130387 2.32743648
H 2.59119484 -1.09851680 2.05978169
H 2.70759585 -1.96209006 0.50483662
H 3.54542194 -0.42323387 -1.73859870
H 2.00935919 1.32420657 0.30422365
H 1.66501162 -0.37536268 0.02362459
H 0.11058920 1.13175596 -4.18187029
$end

$rem
JOB_TYPE sp
method b3lyp
BASIS 6-31g++(d,p)
cis_n_roots 1
cis_singlets true
cis_triplets true
max_cis_cycles 500
RPA 1
max_scf_cycles 1000
molden_format true
mem_static 12000
mem_total 200000
$end

$comment
tddft_sp
$end

Input looks okay, although MEM_STATIC is capped at 8000 and shouldn’t be necessary for this job. This is 1068 basis functions, which is certainly doable, and for just one excited state the memory requirement should not be very large at all. At what point in the Q-Chem output does it crash? You might try a small basis set first (e.g., 3-21G) to see if the job runs to completion

Thank you for your answer. As you have suggested, I have reduced MEM_STATIC to 1000 and tested on small basis set 6-31g(d) and 6-31++g(d). Although in smaller basis set (e.g. 6-31g(d)) Q-Chem run well, there is still error in 6-31++g(d).

Two error types appear: EXIT CODE, 6 and 11.
In EXIT CODE 6, output contains a line that Q-Chem try to start CIS energy, but no iteration, as shown below:

Direct TDDFT/TDA calculation will be performed
 Exchange:     0.2000 Hartree-Fock + 0.0800 Slater + 0.7200 B88
 Correlation:  0.1900 VWN1RPA + 0.8100 LYP
 Using SG-1 standard quadrature grid
 Triplet excitation energies requested
 Singlet excitation energies requested
 CIS energy converged when residual is below 10e- 6
-----------------------------------------------------
 Iter    Rts Conv    Rts Left    Ttl Dev     Max Dev
-----------------------------------------------------

=========================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=========================================================

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

In EXIT CODE 11, Q-Chem output crashes at the same point , but in this type of error (most crashed calculation calls EXIT CODE 11) there is no dashed lines and no headers, as shown below:

Direct TDDFT/TDA calculation will be performed
 Exchange:     0.2000 Hartree-Fock + 0.0800 Slater + 0.7200 B88
 Correlation:  0.1900 VWN1RPA + 0.8100 LYP
 Using SG-1 standard quadrature grid
 Triplet excitation energies requested
 Singlet excitation energies requested
 CIS energy converged when residual is below 10e- 6

=========================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=========================================================

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

I have tried changing MEM_STATIC and MEM_TOTAL, but I can’t find the solution, so I still have no idea why all this calculation is crashed. Can you give me more advise on this issue?

Please try with the small basis set but with the larger memory settings. You are using a rather old version of Q-Chem; newer versions print some information prior to the TDDFT iterations about whether the memory settings are adequate.

Thanks for advising.

As a freshman of Q-Chem, I have a stupid and basic question. How could I properly set the MEM_STATIC and MEM_TOTAL? Is there any guide-line, or upper limit? As you already mention, I use old version of Q-Chem, so I need to understand about what MEM_ variables do.

As I understand, MEM_TOTAL is maximum storage of memory, so I guess upper limit of this is available RAM memory. Meanwhile, MEM_STATIC is hard to understand. The manual says that we should set MEM_STATIC to store “multiple data sets and the thing managed by codes”, and (MEM_TOTAL - MEM_STATIC) is dynamic memory, which is allocated using system calls. But, what is difference between storagy managed by code and storage allocated via system calls?

The reason I ask this question is that I think the implicitly defined dynamic memory might be the cause of my error. For example, if the total available RAM is 120GB, and I alloate MEM_TOTAL as 200GB and MEM_STATIC 0.2GB, there is no “dynamic memory” available (e.g. 200GB - 0.2GB = 199.8GB is over the capacity of my RAM).

So, would you mind give me an advise of setting MEM_TOTAL and MEM_STATIC, and also their potential problem?

There is usually no need to mess with MEM_STATIC anymore, that is fortran memory and most of the memory bottlenecks have been migrated to dynamic memory allocation. MEM_TOTAL sets the maximum memory that Q-Chem is allowed to use, and if you’re unsure then there’s no reason not to set it to something approaching the maximum memory that is available on your hardware.

Thank you for your advise.

Unfortunatlly, I haven’t solved my error even though I had tried many simulation including setting large MEM_TOTAL and/or MEM_STATIC and also use default value with many errored molecules. Also, I have tried to change -np option which is number of processor to not all possible thread, but smaller than it, but still Q-Chem makes same error.

In my error file of pbs, Q-Chem always report SockError: Recv error. Would you mind give me an advice of this error? I’m not sure this error message is helpful, but I have no more clue.

On my system this job runs with very minimal memory (using Q-Chem v. 6.1), i.e., omitting MEM_STATIC entirely (as I suggested above) and MEM_TOTAL=1000. On 40 cores the SCF takes 379 s. The memory messaging (not present in your version of Q-Chem) suggests 4.1 Mb of additional memory per TDDFT iteration, and both singlets and triplets take 7 iterations each for TDA/TDDFT, then 10 iterations for RPA.

The error you are getting looks like it comes from the operating system (or possibly the batch scheduler), not from Q-Chem. If problems persist with minimal memory settings then I suggest turning the memory way down, or running this as separate calculations: singlet TDA, then triplet TDA (via CIS_SINGLETS = FALSE or CIS_TRIPLETS = FALSE), then singlet RPA, then triplet RPA (using RPA=2 to skip TDA).

Here is the memory messaging, 6-31++G** basis set:

 CIS/TDDFT memory preview:
 --------------------------
        NOV = NOa*NVa + NOb*NVb = 270900
        NRoots = 1, max iter = 500, max vectors = 500
        Size of each subspace vector: 2.07 Mb
        Davidson algorithm may add up to 2 x NOV x NRoots (= 4.1 Mb) per iteration
        Max memory = 2066.8 Mb (worst case scenario)
        Currently available memory = 1000.0 Mb

        In case of problems:
        (a) increase memory (MEM_TOTAL = 2067 can hold 500 subspace vectors), or
        (b) request fewer states (CIS_N_ROOTS = 0 will fit in current MEM_TOTAL), or
        (c) reduce MAX_CIS_SUBSPACE (= 241 will fit in current MEM_TOTAL).
        Reducing MAX_CIS_SUBSPACE may hamper convergence.

I really appreciate your answer, I finally found the problem and fixed it! It was problem of multiprocessing, not memory.

I did the same calculation by reducing the number of processors (-np option in Q-Chem 4.4) by half, from the maximum number of CPUs to 1. And I found that only with 24 processors or less, the program works fine without throwing any errors.

I have fixed the problem, but I still have a question. When I look at the output of Q-Chem, I only see 8 threads at the very beginning. Does this mean that it doesn’t recognize the entire (say, 24) processors as it should? When I monitor the processor during the SCF calculation, it will use all the processors I set, but TDDFT uses one processor, so I think this may have been caused by the process of reducing processors.

There is some memory overhead associated with multithreading that I do not completely understand. If you contact Q-Chem support, they might have a better answer.

Regarding the number of threads, this should be printed just before the start of the SCF cycles and should correspond to whatever was specified with -np. TDDFT should be using the same number of threads. I haven’t benchmarked it carefully, but we definitely see noticeable speedups for multithreading in both SCF and TDDFT. Possibly the latter is a more recent feature than your version of Q-Chem (?). Not sure.

Thank you for your kind answer. I’ll contact Q-Chem support to check on the multithreading in both SCF and TDDFT with recent version update.