Several problems for using IQmol with HPC (Slurm)

Hi,

I have been setting up QChem (version 5.3) on a linux cluster (with Slurm) and IQmol on Macs and Windows to evaluate if we can use it for an undergraduate quantum chemistry course.

We managed to get iQmol to submit some simple jobs and run the job on the cluster (through Slurm), but we found several small issues with using iQmol (Please find the attached figures at the bottom):

  1. All highlighted tabs on pop-up manuals have used the wrong color code (i.e. a white font over white background for dark mode) (See Fig 1)

  2. Setting geometry constraints doesn’t seem to work: no effect after selecting it on the manual bar [18 June Update] OK, I figure out how to get it work now: need select the atoms first. My bad, sorry!

  3. The template Slurm script is not up to date:

    a — “qchem -seq” is not accepted by qchem now. In addition, it may not be a reasonable assumption for running serial qchem nowadays. (See Fig 2)

    b — Please provide some examples for users on how to change the Slurm script template, and how to load the module or qcenv.csh.

    Please also list the internal variable, e.g. $NCPUS, $MEMORY, $QUEUE used in iQmol; users may have difficulty reconfiguring the Slurm script without this set of variables. [18 June Update] OK, I found them in the Help Browser

    c — Fig 3 is the Slurm template we used on iQmol (See Fig 3)

  4. The job submission manual can be improved in several areas:

    a — The pulldown manual for choosing the queue is not updated automatically: I created a new queue/partition for qchem on the server AFTER setting up the server information on iQmol; the new partition is not displayed (or it was hidden at the bottom of the non-scrollable pulldown manual). (See Fig 4)

    To solve this issue, I have to define the queue and QOS explicitly in the Slurm template

    b — For slurm users, besides the partitions (queues), they may also need to select a QOS (quality of service) which is related to the job priority and usage token accounting, but there is no way to select it from the manual. (See Fig 5a)

    You may use this command to query the available QOS:
    sacctmgr show qos format=name%15 -n

    c — Most of the HPC sever nodes have more than 100GB of memory, but the manual has restricted user from inputting >=100000MB or changing the units (See Fig 5b)

    d — I know that MPI is not favorable in QChem developers, but I believe it is still supported for HF and DFT calculations; it seems that iQmol does not allow MPI jobs, and there is no way to set the number of nodes, and how MPI tasks are assigned in the submission manual. (See Fig 5c)

    e — How can users use GPU version QChem with IQmol?

  5. The Job Monitor is not able to retrieve the correct running status from Slurm. In addition. (See Fig 6)

    I would suggest showing the scheduler’s Job ID at Job Monitor as well

    [18 June Update] IQmol doesn’t keep the Slurm’s Job ID
    I added the following lines in the Slurm template:
    echo SLURM_JOB_ID=$SLURM_JOB_ID
    echo SLURM_JOB_NAME=$SLURM_JOB_NAME
    echo JOB_ID=${JOB_ID}
    echo JOB_NAME=${JOB_NAME}
    echo JOB_DIR=${JOB_DIR}

    And at the output/err file, the variable ${JOB_ID} is undefined
    SLURM_JOB_ID=81885
    SLURM_JOB_NAME=C2H4.run
    JOB_ID=
    JOB_NAME=C2H4
    JOB_DIR=/scratch/chiensh/C2H4

  6. Since Job Monitor has not been able to retrieve the correct job status, users cannot kill the jobs or copy the result until the job is finished. (See Fig 7)

  7. I remember that I could see the live result (output file) with Job Monitor (View Output File) during the job running when it was running on my desktop (without Slurm), it will be a highly desirable feature if it is implemented for the jobs on a remote cluster.

  8. I believe more and more Mac users will turn to use iPad, it will be very nice if IQmol can run on iPads.

  9. Please find the figures below (It is not reasonable to allow only 1 picture for each message)

Thanks a lot!

Best Regards,
Dominic Chien

There is quite a bit to unpack here, but let me try to address your concerns:

  1. IQmol does not explicitly set the font colours, this is handled by the Qt libraries which, in turn, rely on the system libraries. I have not seen this problem before, which Mac OS are you using?

  2. The submission template is designed to be flexible so that it can be used with various versions of IQmol, Q-Chem the schedular and modules. It cannot be
    configured to work out of the box in all cases, but I take your point about
    modifying the default to handle parallel jobs and will update the runfile
    template.

  3. a. The queue names and resource limits are cached the first time you attempt to submit a job to a particualr server, so subsequent changes will not be visible until after IQmol restarts. This is done to reduce communication overhead.

    b. Is the QOS a parameter which the user is likely to want to change for each
    job, or is it something which could be entered into the runfile template?

    e. I have not used the GPU version of Q-Chem, but I belive you need to pass the -gpu flag to qchem. Presently the only way to do this is to hard-wire this flag in the runfile template. If you want to run both with and without GPU support you would need to configure separate servers within IQmol.

  4. For SLURM servers, IQmol is expecting a return from sbatch of the form:
    Submitted batch job xxxx
    and sets the job number to xxxx. Is this not what your sbatch is returning?

  5. This is a consequence of 5 above.

  6. I will take this a feature request.

  7. I agree being able to run on mobile devices would be nice, but porting it would be a significant undertaking and operating through the App Store incurrs addition costs.

Thanks, Andrew!

  1. II am running Version 11.4 on an intel based Mac (both Light and Dark modes were tried), and the version of IQmol is 2.15.0

Here is more information for you to debug:

  1. noted with thanks, it will be even better if it can handle both MPI and GPU versions in the future.

a. I have restarted IQmol and reconnected to the server for many times; I think it is just because partition “qchem” was added at last and it goes to the bottom of the list, it was hidden by the pulldown manual which is not scrollable.

b. Yes, users may choose a different QOS for different job priority and privileges; for some other sites, users may even want to set their accounts; which is related to how the usage credits/money to be charged

It can be shown with the commands
QOS : "sacctmgr show qos -n |awk ‘{print $1}’ "
Users accounts: “sacctmgr show user $USER -s -n |awk ‘{print $5}’”

e. noted

  1. the return of “Submitted batch job 1234” is correct, but it seems that the number has not passed back to IQmol correctly; I tested it on both Mac and Windows version
    Somehow IQmol doesn’t know the job status

  2. noted.

Additional update, we have not been able to download the latest version (2.15.0) for windows from the official site in the last few days; the download is stuck at 17MB. There is no such problem for the older versions or 2.15.0 for Mac.

Updated on 4e) I found that IQmol actually manages to retrieve the Job ID and the status (they can be seen when the cursor is over the job status), but the job status in the list was not updated.

New Issues:
9) Animation recording fails on both Windows (2.14.1) and Mac (2.13 and 2.15):
a) If I select “Record Animation” from the icon or drop-down manual, and save it as default format (mov), it manages to generate a number of screenshots in png format, but it won’t combine them into a single movie when stop recording.
b) If I select to change the format from default *mov to MP4, iqmol does not record the move (not generating the png image files)