Hi Brian, Thanks for suggesting this interesting feature of Slurm. And sorry for the late follow up since I only had access to the cluster for a short time.
We were now able to perform HPL benchmark across different partitions with correct NUMA affinity. For future reference, I put the procedure here: $ salloc \ --partition=v100 --nodes=1 --ntasks-per-node=40 --gres=gpu:4 : \ --partition=a100 --nodes=1 --ntasks-per-node=64 --gres=gpu:8 $ srun \ -n 4 : \ -n 8 \ hpl.sh Initially we thought there would be some performance degradation when mixing partitions. But at least for small scale test, this seems to be negligible. Thanks. Viet-Duc On Thu, Dec 8, 2022 at 2:27 AM Brian Andrus <toomuc...@gmail.com> wrote: > You may want to look here: > > https://slurm.schedmd.com/heterogeneous_jobs.html > > Brian Andrus > On 12/7/2022 12:42 AM, Le, Viet Duc wrote: > > Dear slurm community, > > > I am encountering a unique situation where I need to allocate jobs to > nodes with different numbers of CPU cores. For instance: > > node01: Xeon 6226 32 cores > > node02: EPYC 7543 64 cores > > > $ salloc > --partition=all --nodes=2 --nodelist=gpu01,gpu02 --ntasks-per-node=32 > --comment=etc > > If --ntasks-per-node is larger than 32, the job could not be allocated > since node01 has only 32 cores. > > > In the context of NVIDIA's HPL container, we need to pin MPI > processes according to NUMA affinity for best performance. > > For HGX-1, there are 8 A100s having affinity with 1st, 3rd, 5th, and 7th > NUMA domain, respectively. > > With --ntasks-per-node=32, only the first half of EPYC's NUMA domain is > available, and we had to assign the 4-7th A100 to 0th and 2nd NUMA domain, > leading to some performance degradation. > > > I am looking for a way to request more tasks than the number of physically > available cores, i.e. > > $ salloc --partition=all --nodes=2 --nodelist=gpu01,gpu02 > --ntasks-per-node=64 --comment=etc > > > Your suggestions are much appreciated. > > > Regards, > > Viet-Duc > >