[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Hongyi Zhao via slurm-users Fri, 24 May 2024 19:18:14 -0700

On Sat, May 25, 2024 at 10:06 AM Hongyi Zhao <hongyi.z...@gmail.com> wrote:
>
> On Sat, May 25, 2024 at 9:49 AM Hongyi Zhao <hongyi.z...@gmail.com> wrote:
> >
> > On Sat, May 25, 2024 at 7:50 AM Hongyi Zhao <hongyi.z...@gmail.com> wrote:
> > >
> > > On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users
> > > <slurm-users@lists.schedmd.com> wrote:
> > > >
> > > > Hi Zhao,
> > > >
> > > > my guess is that in your faster case you are using hyperthreading
> > > > whereas in the Slurm case you don't.
> > > >
> > > > Can you check what performance you get when you add
> > > >
> > > > #SBATCH --hint=multithread
> > > >
> > > > to you slurm script?
> > >
> > > I tried to add the above instructions to the slurm script, and only
> > > found that the job will stuck there forever. Here are the results 10
> > > minutes after the job was submitted:
> > >
> > >
> > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> > > cat sub.sh.o6
> > > #######################################################
> > > date                    = 2024年 05月 25日 星期六 07:31:31 CST
> > > hostname                = x13dai-t
> > > pwd                     =
> > > /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV
> > > sbatch                  = /usr/bin/sbatch
> > >
> > > WORK_DIR                =
> > > SLURM_SUBMIT_DIR        =
> > > /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV
> > > SLURM_JOB_NUM_NODES     = 1
> > > SLURM_NTASKS            = 36
> > > SLURM_NTASKS_PER_NODE   =
> > > SLURM_CPUS_PER_TASK     =
> > > SLURM_JOBID             = 6
> > > SLURM_JOB_NODELIST      = localhost
> > > SLURM_NNODES            = 1
> > > SLURMTMPDIR             =
> > > #######################################################
> > >
> > >  running   36 mpi-ranks, on    1 nodes
> > >  distrk:  each k-point on   36 cores,    1 groups
> > >  distr:  one band on    4 cores,    9 groups
> > >  vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
> > >
> > >  POSCAR found type information on POSCAR Cr
> > >  POSCAR found :  1 types and      72 ions
> > >  Reading from existing POTCAR
> > >  scaLAPACK will be used
> > >  Reading from existing POTCAR
> > >  
> > > -----------------------------------------------------------------------------
> > > |                                                                         
> > >     |
> > > |               ----> ADVICE to this user running VASP <----              
> > >     |
> > > |                                                                         
> > >     |
> > > |     You have a (more or less) 'large supercell' and for larger cells it 
> > >     |
> > > |     might be more efficient to use real-space projection operators.     
> > >     |
> > > |     Therefore, try LREAL= Auto in the INCAR file.                       
> > >     |
> > > |     Mind: For very accurate calculation, you might also keep the        
> > >     |
> > > |     reciprocal projection scheme (i.e. LREAL=.FALSE.).                  
> > >     |
> > > |                                                                         
> > >     |
> > >  
> > > -----------------------------------------------------------------------------
> > >
> > >  LDA part: xc-table for (Slater+PW92), standard interpolation
> > >  POSCAR, INCAR and KPOINTS ok, starting setup
> > >  FFT: planning ... GRIDC
> > >  FFT: planning ... GRID_SOFT
> > >  FFT: planning ... GRID
> > >  WAVECAR not read
> >
> > Ultimately, I found that the cause of the problem was that
> > hyper-threading was enabled by default in the BIOS. If I disable
> > hyper-threading, I observed that the computational efficiency is
> > consistent between using slurm and using mpirun directly. Therefore,
> > it appears that hyper-threading should not be enabled in the BIOS when
> > using slurm.
>
> Regarding the reason, I think the description here [1] is reasonable:
>
> It is recommended to disable processor hyper-threading. In
> applications that are compute-intensive rather than I/O-intensive,
> enabling HyperThreading is likely to decrease the overall performance
> of the server. Intuitively, the physical memory available per core is
> reduced after hyper-threading is enabled.
>
> [1] 
> https://gist.github.com/weijianwen/acee3cd49825da8c8dfb4a99365b54c8#%E5%85%B3%E9%97%AD%E5%A4%84%E7%90%86%E5%99%A8%E8%B6%85%E7%BA%BF%E7%A8%8B


See here [1] for the related discussion.

[1] https://www.vasp.at/forum/viewtopic.php?t=19557

Regards,
Zhao

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Reply via email to