On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users <slurm-users@lists.schedmd.com> wrote: > > Hi Zhao, > > my guess is that in your faster case you are using hyperthreading > whereas in the Slurm case you don't. > > Can you check what performance you get when you add > > #SBATCH --hint=multithread > > to you slurm script?
I tried to add the above instructions to the slurm script, and only found that the job will stuck there forever. Here are the results 10 minutes after the job was submitted: werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ cat sub.sh.o6 ####################################################### date = 2024年 05月 25日 星期六 07:31:31 CST hostname = x13dai-t pwd = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV sbatch = /usr/bin/sbatch WORK_DIR = SLURM_SUBMIT_DIR = /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV SLURM_JOB_NUM_NODES = 1 SLURM_NTASKS = 36 SLURM_NTASKS_PER_NODE = SLURM_CPUS_PER_TASK = SLURM_JOBID = 6 SLURM_JOB_NODELIST = localhost SLURM_NNODES = 1 SLURMTMPDIR = ####################################################### running 36 mpi-ranks, on 1 nodes distrk: each k-point on 36 cores, 1 groups distr: one band on 4 cores, 9 groups vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex POSCAR found type information on POSCAR Cr POSCAR found : 1 types and 72 ions Reading from existing POTCAR scaLAPACK will be used Reading from existing POTCAR ----------------------------------------------------------------------------- | | | ----> ADVICE to this user running VASP <---- | | | | You have a (more or less) 'large supercell' and for larger cells it | | might be more efficient to use real-space projection operators. | | Therefore, try LREAL= Auto in the INCAR file. | | Mind: For very accurate calculation, you might also keep the | | reciprocal projection scheme (i.e. LREAL=.FALSE.). | | | ----------------------------------------------------------------------------- LDA part: xc-table for (Slater+PW92), standard interpolation POSCAR, INCAR and KPOINTS ok, starting setup FFT: planning ... GRIDC FFT: planning ... GRID_SOFT FFT: planning ... GRID WAVECAR not read > Another difference between the two might be > a) the communication channel/interface that is used. I tried to use `mpirun', `mpiexec', and `srun --mpi pmi2', and they all have similar behaviors as described above. > b) the number of nodes involved: when using mpirun you might run things > on more than one node. This is a single-node cluster with two sockets. > Regards, > Hermann Regards, Zhao > On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote: > > Dear Slurm Users, > > > > I am experiencing a significant performance discrepancy when running > > the same VASP job through the Slurm scheduler compared to running it > > directly with mpirun. I am hoping for some insights or advice on how > > to resolve this issue. > > > > System Information: > > > > Slurm Version: 21.08.5 > > OS: Ubuntu 22.04.4 LTS (Jammy) > > > > > > Job Submission Script: > > > > #!/usr/bin/env bash > > #SBATCH -N 1 > > #SBATCH -D . > > #SBATCH --output=%j.out > > #SBATCH --error=%j.err > > ##SBATCH --time=2-00:00:00 > > #SBATCH --ntasks=36 > > #SBATCH --mem=64G > > > > echo '#######################################################' > > echo "date = $(date)" > > echo "hostname = $(hostname -s)" > > echo "pwd = $(pwd)" > > echo "sbatch = $(which sbatch | xargs realpath -e)" > > echo "" > > echo "WORK_DIR = $WORK_DIR" > > echo "SLURM_SUBMIT_DIR = $SLURM_SUBMIT_DIR" > > echo "SLURM_JOB_NUM_NODES = $SLURM_JOB_NUM_NODES" > > echo "SLURM_NTASKS = $SLURM_NTASKS" > > echo "SLURM_NTASKS_PER_NODE = $SLURM_NTASKS_PER_NODE" > > echo "SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK" > > echo "SLURM_JOBID = $SLURM_JOBID" > > echo "SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST" > > echo "SLURM_NNODES = $SLURM_NNODES" > > echo "SLURMTMPDIR = $SLURMTMPDIR" > > echo '#######################################################' > > echo "" > > > > module purge > /dev/null 2>&1 > > module load vasp > > ulimit -s unlimited > > mpirun vasp_std > > > > > > Performance Observation: > > > > When running the job through Slurm: > > > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ > > grep LOOP OUTCAR > > LOOP: cpu time 14.4893: real time 14.5049 > > LOOP: cpu time 14.3538: real time 14.3621 > > LOOP: cpu time 14.3870: real time 14.3568 > > LOOP: cpu time 15.9722: real time 15.9018 > > LOOP: cpu time 16.4527: real time 16.4370 > > LOOP: cpu time 16.7918: real time 16.7781 > > LOOP: cpu time 16.9797: real time 16.9961 > > LOOP: cpu time 15.9762: real time 16.0124 > > LOOP: cpu time 16.8835: real time 16.9008 > > LOOP: cpu time 15.2828: real time 15.2921 > > LOOP+: cpu time 176.0917: real time 176.0755 > > > > When running the job directly with mpirun: > > > > > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ > > mpirun -n 36 vasp_std > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$ > > grep LOOP OUTCAR > > LOOP: cpu time 9.0072: real time 9.0074 > > LOOP: cpu time 9.0515: real time 9.0524 > > LOOP: cpu time 9.1896: real time 9.1907 > > LOOP: cpu time 10.1467: real time 10.1479 > > LOOP: cpu time 10.2691: real time 10.2705 > > LOOP: cpu time 10.4330: real time 10.4340 > > LOOP: cpu time 10.9049: real time 10.9055 > > LOOP: cpu time 9.9718: real time 9.9714 > > LOOP: cpu time 10.4511: real time 10.4470 > > LOOP: cpu time 9.4621: real time 9.4584 > > LOOP+: cpu time 110.0790: real time 110.0739 > > > > > > Could you provide any insights or suggestions on what might be causing > > this performance issue? Are there any specific configurations or > > settings in Slurm that I should check or adjust to align the > > performance more closely with the direct mpirun execution? > > > > Thank you for your time and assistance. > > > > Best regards, > > Zhao > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com