On Fri, May 24, 2024 at 9:32 PM Hongyi Zhao <hongyi.z...@gmail.com> wrote: > > Dear Slurm Users, > > I am experiencing a significant performance discrepancy when running > the same VASP job through the Slurm scheduler compared to running it > directly with mpirun. I am hoping for some insights or advice on how > to resolve this issue. > > System Information: > > Slurm Version: 21.08.5 > OS: Ubuntu 22.04.4 LTS (Jammy) > > > Job Submission Script: > > #!/usr/bin/env bash > #SBATCH -N 1 > #SBATCH -D . > #SBATCH --output=%j.out > #SBATCH --error=%j.err > ##SBATCH --time=2-00:00:00 > #SBATCH --ntasks=36 > #SBATCH --mem=64G > > echo '#######################################################' > echo "date = $(date)" > echo "hostname = $(hostname -s)" > echo "pwd = $(pwd)" > echo "sbatch = $(which sbatch | xargs realpath -e)" > echo "" > echo "WORK_DIR = $WORK_DIR" > echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" > echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" > echo "SLURM_NTASKS = $SLURM_NTASKS" > echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" > echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" > echo "SLURM_JOBID = $SLURM_JOBID" > echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" > echo "SLURM_NNODES = $SLURM_NNODES" > echo "SLURMTMPDIR = $SLURMTMPDIR" > echo '#######################################################' > echo "" > > module purge > /dev/null 2>&1 > module load vasp > ulimit -s unlimited > mpirun vasp_std > > > Performance Observation: > > When running the job through Slurm: > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ > grep LOOP OUTCAR > LOOP: cpu time 14.4893: real time 14.5049 > LOOP: cpu time 14.3538: real time 14.3621 > LOOP: cpu time 14.3870: real time 14.3568 > LOOP: cpu time 15.9722: real time 15.9018 > LOOP: cpu time 16.4527: real time 16.4370 > LOOP: cpu time 16.7918: real time 16.7781 > LOOP: cpu time 16.9797: real time 16.9961 > LOOP: cpu time 15.9762: real time 16.0124 > LOOP: cpu time 16.8835: real time 16.9008 > LOOP: cpu time 15.2828: real time 15.2921 > LOOP+: cpu time 176.0917: real time 176.0755 > > When running the job directly with mpirun: > > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ > mpirun -n 36 vasp_std > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ > grep LOOP OUTCAR > LOOP: cpu time 9.0072: real time 9.0074 > LOOP: cpu time 9.0515: real time 9.0524 > LOOP: cpu time 9.1896: real time 9.1907 > LOOP: cpu time 10.1467: real time 10.1479 > LOOP: cpu time 10.2691: real time 10.2705 > LOOP: cpu time 10.4330: real time 10.4340 > LOOP: cpu time 10.9049: real time 10.9055 > LOOP: cpu time 9.9718: real time 9.9714 > LOOP: cpu time 10.4511: real time 10.4470 > LOOP: cpu time 9.4621: real time 9.4584 > LOOP+: cpu time 110.0790: real time 110.0739 > > > Could you provide any insights or suggestions on what might be causing > this performance issue? Are there any specific configurations or > settings in Slurm that I should check or adjust to align the > performance more closely with the direct mpirun execution? > > Thank you for your time and assistance.
The attachment is the test example used above. > > Best regards, > Zhao > -- > Assoc. Prof. Hongsheng Zhao <hongyi.z...@gmail.com> > Theory and Simulation of Materials > Hebei Vocational University of Technology and Engineering > No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province
<<attachment: Cr72_3x3x3K_350eV_10DAV.zip>>
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com