[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Hongyi Zhao via slurm-users Fri, 24 May 2024 06:40:14 -0700

On Fri, May 24, 2024 at 9:32 PM Hongyi Zhao <hongyi.z...@gmail.com> wrote:
>
> Dear Slurm Users,
>
> I am experiencing a significant performance discrepancy when running
> the same VASP job through the Slurm scheduler compared to running it
> directly with mpirun. I am hoping for some insights or advice on how
> to resolve this issue.
>
> System Information:
>
> Slurm Version: 21.08.5
> OS: Ubuntu 22.04.4 LTS (Jammy)
>
>
> Job Submission Script:
>
> #!/usr/bin/env bash
> #SBATCH -N 1
> #SBATCH -D .
> #SBATCH --output=%j.out
> #SBATCH --error=%j.err
> ##SBATCH --time=2-00:00:00
> #SBATCH --ntasks=36
> #SBATCH --mem=64G
>
> echo '#######################################################'
> echo "date                    = $(date)"
> echo "hostname                = $(hostname -s)"
> echo "pwd                     = $(pwd)"
> echo "sbatch                  = $(which sbatch | xargs realpath -e)"
> echo ""
> echo "WORK_DIR                = $WORK_DIR"
> echo "SLURM_SUBMIT_DIR        = $SLURM_SUBMIT_DIR"
> echo "SLURM_JOB_NUM_NODES     = $SLURM_JOB_NUM_NODES"
> echo "SLURM_NTASKS            = $SLURM_NTASKS"
> echo "SLURM_NTASKS_PER_NODE   = $SLURM_NTASKS_PER_NODE"
> echo "SLURM_CPUS_PER_TASK     = $SLURM_CPUS_PER_TASK"
> echo "SLURM_JOBID             = $SLURM_JOBID"
> echo "SLURM_JOB_NODELIST      = $SLURM_JOB_NODELIST"
> echo "SLURM_NNODES            = $SLURM_NNODES"
> echo "SLURMTMPDIR             = $SLURMTMPDIR"
> echo '#######################################################'
> echo ""
>
> module purge > /dev/null 2>&1
> module load vasp
> ulimit -s unlimited
> mpirun vasp_std
>
>
> Performance Observation:
>
> When running the job through Slurm:
>
> werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> grep LOOP OUTCAR
>       LOOP:  cpu time     14.4893: real time     14.5049
>       LOOP:  cpu time     14.3538: real time     14.3621
>       LOOP:  cpu time     14.3870: real time     14.3568
>       LOOP:  cpu time     15.9722: real time     15.9018
>       LOOP:  cpu time     16.4527: real time     16.4370
>       LOOP:  cpu time     16.7918: real time     16.7781
>       LOOP:  cpu time     16.9797: real time     16.9961
>       LOOP:  cpu time     15.9762: real time     16.0124
>       LOOP:  cpu time     16.8835: real time     16.9008
>       LOOP:  cpu time     15.2828: real time     15.2921
>      LOOP+:  cpu time    176.0917: real time    176.0755
>
> When running the job directly with mpirun:
>
>
> werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> mpirun -n 36 vasp_std
> werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> grep LOOP OUTCAR
>       LOOP:  cpu time      9.0072: real time      9.0074
>       LOOP:  cpu time      9.0515: real time      9.0524
>       LOOP:  cpu time      9.1896: real time      9.1907
>       LOOP:  cpu time     10.1467: real time     10.1479
>       LOOP:  cpu time     10.2691: real time     10.2705
>       LOOP:  cpu time     10.4330: real time     10.4340
>       LOOP:  cpu time     10.9049: real time     10.9055
>       LOOP:  cpu time      9.9718: real time      9.9714
>       LOOP:  cpu time     10.4511: real time     10.4470
>       LOOP:  cpu time      9.4621: real time      9.4584
>      LOOP+:  cpu time    110.0790: real time    110.0739
>
>
> Could you provide any insights or suggestions on what might be causing
> this performance issue? Are there any specific configurations or
> settings in Slurm that I should check or adjust to align the
> performance more closely with the direct mpirun execution?
>
> Thank you for your time and assistance.


The attachment is the test example used above.

>
> Best regards,
> Zhao
> --
> Assoc. Prof. Hongsheng Zhao <hongyi.z...@gmail.com>
> Theory and Simulation of Materials
> Hebei Vocational University of Technology and Engineering
> No. 473, Quannan West Street, Xindu District, Xingtai, Hebei province

<<attachment: Cr72_3x3x3K_350eV_10DAV.zip>>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Reply via email to