Dear all,

We frequently encounter Slurm in use across the WLCG, which provides us with 
the slot where we (ALICE) run our job pilots. With the emergence of more 
multicore oriented workflows, these pilots have since become highly tasked with 
managing the resources we have within each slot, so to best utilise the 
resources given to us. With users often requesting arbitrary resources (cpu and 
memory in particular), combined with several user payloads running in parallel 
in the same slot (as seen by the BQ), this process has in turn become 
increasingly challenging.

One interesting development is the arrival of Cgroups v2, which provides means 
for unprivileged users to delegate controllers. This is a very useful feature 
in our use-case, as it would enable further subdividing the resources given to 
us within each slot, allowing the pilot to better "box-in" each subjob.

That said, in order to delegate controllers (e.g. for memory) to an 
unprivileged user, that user must be given ownership of the new cgroup given to 
them by Slurm, as well as the subtree_controller/procs files within that cgroup.

I see that in v22.05, users were already given ownership of the newly created 
cgroup provided to them (albeit sans the controller files), though this was 
later changed and removed in commit 
b0e4223<https://github.com/SchedMD/slurm/commit/b0e422399f43e81903ead651d8da4430ebb8ec89>
 - where the commit message suggests this behaviour should instead be avoided. 
With the additional permissions on the files that were not delegated at that 
point, this feature would actually be complete for us. Could you please 
reconsider supporting unprivileged cgroups v2? For the record, here is the 
diff<https://github.com/SchedMD/slurm/compare/slurm-22.05...zensanp:slurm:slurm-22.05>
 to v22.05 that allows us to further slice the allocated slot in smaller chunks.

Best regards,
-Maxim Storetvedt

Reply via email to