Hi:

Thanks for your feedback guys :).

We continue to find srun behaving properly re: core placement.

BTW, we've further established that only MVAPICH (and therefore also Intel MPI) 
jobs are encountering the OOM issue.

==
Paul Brunk, system administrator
Georgia Advanced Resource Computing Center
Enterprise IT Svcs, the University of Georgia


Paul Edmon wrote:

We also noticed the same thing with 21.08.5.  In the 21.08 series SchedMD 
changed the way they handle cgroups to set the stage for cgroups v2 (see: 
https://slurm.schedmd.com/SLUG21/Roadmap.pdf).  The 21.08.5 introduced a bug 
fix which then caused mpirun to not pin properly (particularly for older 
versions of MPI): https://github.com/SchedMD/slurm/blob/slurm-21-08-5-1/NEWS  
What we've recommended to users who have hit this was to swap over to using 
srun instead of mpirun and the situation clears up.
-Paul Edmon-

On 2/10/2022 8:59 AM, Ward Poelmans wrote:

I'm not sure if this is the case but it might help to keep in mind the 
difference between mpirun and srun.

Reply via email to