Thank you for this note, the comments on the issue you raised have saved me a lot of time.
I agree that the documentation around this issue of cpu assignment/allocation is confusing and part of the problem. Of particular concern is the changing behavior of cpu allocation when adding a gres that has nothing to do with cpu assignment. The non-deterministic behavior here is unsettling. I guess I will need to have a look at a plugin so that we can ensure deterministic behavior. We have already modified the slurm source to allow regular users to create there own reservations, so perhaps I might get annoyed enough to 'fix' this in the slurm code base. Thank you! Alan On 6/7/24 18:36, Juergen Salk wrote: > Hi Alan, > > unfortunately, process placement in Slurm is kind of black magic for > sub-node jobs, i.e. jobs that allocate only a small number of CPUs of > a node. > > I have recently raised a similar question here: > > https://support.schedmd.com/show_bug.cgi?id=19236 > > And the buttom line was, that to "really have control over task placement > you really have to allocate the node in --exclusive manner". > > Best regards > Jürgen > > > * Alan Stange via slurm-users <slurm-users@lists.schedmd.com> [240607 14:52]: >> All, >> >> I have a very simple slurm cluster. It's just a single system with 2 >> sockets and 16 cores in each socket. I would like to be able to submit >> a simple task into this cluster, and to have the cpus assigned to that >> task allocated round robin across the two sockets. Everything I try is >> putting all the cpus for this single task on the same socket. >> >> I have not specified any CpuBind options in the slurm.conf file. For >> example, if I try >> >> $ srun -c 4 --pty bash >> >> I get a shell prompt on the system, and can run >> >> $ taskset -cp $$ >> pid 12345 current affinity list: 0,2,4,6 >> >> and I get this same set of cpus no matter what options I try (the >> cluster is idle with no tasks consuming slots). >> >> I've tried various srun command line options like: >> --hint=compute_bound >> --hint=memory_bound >> various --cpubind options >> -B 2:2 -m block:cyclic and block:fcyclic >> >> Note that if I try to allocation 17 cpus, then I do get the 17th cpu >> allocated on the 2nd socket. >> >> >> What magic incantation is needed to get an allocation where the cpus are >> chosen round robin across the sockets? >> >> Thank you! >> >> Alan >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- > Jürgen Salk > Scientific Software & Compute Services (SSCS) > Kommunikations- und Informationszentrum (kiz) > Universität Ulm > Telefon: +49 (0)731 50-22478 > Telefax: +49 (0)731 50-22471 -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com