[slurm-users] Re: Restricting local disk storage of jobs

Jeffrey T Frey via slurm-users Wed, 07 Feb 2024 04:30:40 -0800

The native job_container/tmpfs would certainly have access to the job record, 
so modification to it (or a forked variant) would be possible.  A SPANK plugin 
should be able to fetch the full job record [1] and is then able to inspect the 
"gres" list (as a C string), which means I could modify UD's auto_tmpdir 
accordingly.  Having a compiled plugin executing xfs_quota to effect the 
commands illustrated wouldn't be a great idea -- luckily Linux XFS has an API.  
Seemingly not the simplest one, but xfsprogs is a working example.





[1] https://gitlab.hpc.cineca.it/dcesari1/slurm-msrsafe



> On Feb 7, 2024, at 05:25, Tim Schneider via slurm-users 
> <slurm-users@lists.schedmd.com> wrote:
> 
> Hey Jeffrey,
> thanks for this suggestion! This is probably the way to go if one can find a 
> way to access GRES in the prolog. I read somewhere that people were calling 
> scontrol to get this information, but this seems a bit unclean. Anyway, if I 
> find some time I will try it out.
> Best,
> Tim
> On 2/6/24 16:30, Jeffrey T Frey wrote:
>> Most of my ideas have revolved around creating file systems on-the-fly as 
>> part of the job prolog and destroying them in the epilog.  The issue with 
>> that mechanism is that formatting a file system (e.g. mkfs.<type>) can be 
>> time-consuming.  E.g. formatting your local scratch SSD as an LVM PV+VG and 
>> allocating per-job volumes, you'd still need to run a e.g. mkfs.xfs and 
>> mount the new file system. 
>> 
>> 
>> ZFS file system creation is much quicker (basically combines the LVM + mkfs 
>> steps above) but I don't know of any clusters using ZFS to manage local file 
>> systems on the compute nodes :-)
>> 
>> 
>> One could leverage XFS project quotas.  E.g. for Slurm job 2147483647:
>> 
>> 
>> [root@r00n00 /]# mkdir /tmp-alloc/slurm-2147483647
>> [root@r00n00 /]# xfs_quota -x -c 'project -s -p /tmp-alloc/slurm-2147483647 
>> 2147483647' /tmp-alloc
>> Setting up project 2147483647 (path /tmp-alloc/slurm-2147483647)...
>> Processed 1 (/etc/projects and cmdline) paths for project 2147483647 with 
>> recursion depth infinite (-1).
>> [root@r00n00 /]# xfs_quota -x -c 'limit -p bhard=1g 2147483647' /tmp-alloc
>> [root@r00n00 /]# cd /tmp-alloc/slurm-2147483647
>> [root@r00n00 slurm-2147483647]# dd if=/dev/zero of=zeroes bs=5M count=1000
>> dd: error writing ‘zeroes’: No space left on device
>> 205+0 records in
>> 204+0 records out
>> 1073741824 bytes (1.1 GB) copied, 2.92232 s, 367 MB/s
>> 
>>    :
>> 
>> [root@r00n00 /]# rm -rf /tmp-alloc/slurm-2147483647
>> [root@r00n00 /]# xfs_quota -x -c 'limit -p bhard=0 2147483647' /tmp-alloc
>> 
>> 
>> Since Slurm jobids max out at 0x03FFFFFF (and 2147483647 = 0x7FFFFFFF) we 
>> have an easy on-demand project id to use on the file system.  Slurm tmpfs 
>> plugins have to do a mkdir to create the per-job directory, adding two 
>> xfs_quota commands (which run in more or less O(1) time) won't extend the 
>> prolog by much. Likewise, Slurm tmpfs plugins have to scrub the directory at 
>> job cleanup, so adding another xfs_quota command will not do much to change 
>> their epilog execution times.  The main question is "where does the tmpfs 
>> plugin find the quota limit for the job?"
>> 
>> 
>> 
>> 
>> 
>>> On Feb 6, 2024, at 08:39, Tim Schneider via slurm-users 
>>> <slurm-users@lists.schedmd.com> wrote:
>>> 
>>> Hi,
>>> 
>>> In our SLURM cluster, we are using the job_container/tmpfs plugin to ensure 
>>> that each user can use /tmp and it gets cleaned up after them. Currently, 
>>> we are mapping /tmp into the nodes RAM, which means that the cgroups make 
>>> sure that users can only use a certain amount of storage inside /tmp.
>>> 
>>> Now we would like to use of the node's local SSD instead of its RAM to hold 
>>> the files in /tmp. I have seen people define local storage as GRES, but I 
>>> am wondering how to make sure that users do not exceed the storage space 
>>> they requested in a job. Does anyone have an idea how to configure local 
>>> storage as a proper tracked resource?
>>> 
>>> Thanks a lot in advance!
>>> 
>>> Best,
>>> 
>>> Tim
>>> 
>>> 
>>> -- 
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>> 
> 
> -- 
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Restricting local disk storage of jobs

Reply via email to