[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Hermann Schwärzler via slurm-users Fri, 24 May 2024 09:00:58 -0700

Hi Zhao,

my guess is that in your faster case you are using hyperthreadingwhereas in the Slurm case you don't.


Can you check what performance you get when you add

#SBATCH --hint=multithread

to you slurm script?

Another difference between the two might be
a) the communication channel/interface that is used.

b) the number of nodes involved: when using mpirun you might run thingson more than one node.


Regards,
Hermann

On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:

Dear Slurm Users,

I am experiencing a significant performance discrepancy when running
the same VASP job through the Slurm scheduler compared to running it
directly with mpirun. I am hoping for some insights or advice on how
to resolve this issue.

System Information:

Slurm Version: 21.08.5
OS: Ubuntu 22.04.4 LTS (Jammy)


Job Submission Script:

#!/usr/bin/env bash
#SBATCH -N 1
#SBATCH -D .
#SBATCH --output=%j.out
#SBATCH --error=%j.err
##SBATCH --time=2-00:00:00
#SBATCH --ntasks=36
#SBATCH --mem=64G

echo '#######################################################'
echo "date                    = $(date)"
echo "hostname                = $(hostname -s)"
echo "pwd                     = $(pwd)"
echo "sbatch                  = $(which sbatch | xargs realpath -e)"
echo ""
echo "WORK_DIR                = $WORK_DIR"
echo "SLURM_SUBMIT_DIR        = $SLURM_SUBMIT_DIR"
echo "SLURM_JOB_NUM_NODES     = $SLURM_JOB_NUM_NODES"
echo "SLURM_NTASKS            = $SLURM_NTASKS"
echo "SLURM_NTASKS_PER_NODE   = $SLURM_NTASKS_PER_NODE"
echo "SLURM_CPUS_PER_TASK     = $SLURM_CPUS_PER_TASK"
echo "SLURM_JOBID             = $SLURM_JOBID"
echo "SLURM_JOB_NODELIST      = $SLURM_JOB_NODELIST"
echo "SLURM_NNODES            = $SLURM_NNODES"
echo "SLURMTMPDIR             = $SLURMTMPDIR"
echo '#######################################################'
echo ""

module purge > /dev/null 2>&1
module load vasp
ulimit -s unlimited
mpirun vasp_std


Performance Observation:

When running the job through Slurm:

werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
grep LOOP OUTCAR
       LOOP:  cpu time     14.4893: real time     14.5049
       LOOP:  cpu time     14.3538: real time     14.3621
       LOOP:  cpu time     14.3870: real time     14.3568
       LOOP:  cpu time     15.9722: real time     15.9018
       LOOP:  cpu time     16.4527: real time     16.4370
       LOOP:  cpu time     16.7918: real time     16.7781
       LOOP:  cpu time     16.9797: real time     16.9961
       LOOP:  cpu time     15.9762: real time     16.0124
       LOOP:  cpu time     16.8835: real time     16.9008
       LOOP:  cpu time     15.2828: real time     15.2921
      LOOP+:  cpu time    176.0917: real time    176.0755

When running the job directly with mpirun:


werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
mpirun -n 36 vasp_std
werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
grep LOOP OUTCAR
       LOOP:  cpu time      9.0072: real time      9.0074
       LOOP:  cpu time      9.0515: real time      9.0524
       LOOP:  cpu time      9.1896: real time      9.1907
       LOOP:  cpu time     10.1467: real time     10.1479
       LOOP:  cpu time     10.2691: real time     10.2705
       LOOP:  cpu time     10.4330: real time     10.4340
       LOOP:  cpu time     10.9049: real time     10.9055
       LOOP:  cpu time      9.9718: real time      9.9714
       LOOP:  cpu time     10.4511: real time     10.4470
       LOOP:  cpu time      9.4621: real time      9.4584
      LOOP+:  cpu time    110.0790: real time    110.0739


Could you provide any insights or suggestions on what might be causing
this performance issue? Are there any specific configurations or
settings in Slurm that I should check or adjust to align the
performance more closely with the direct mpirun execution?

Thank you for your time and assistance.

Best regards,
Zhao


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Reply via email to