Re: [slurm-users] Using cgroups to hide GPUs on a shared controller/node

Dave Evans Mon, 20 May 2019 17:28:57 -0700

Do you have that resource handy? I looked into the cgroups documentation
but I see very little on tutorials for modifying the permissions.


On Mon, May 20, 2019 at 2:45 AM John Hearns <hear...@googlemail.com> wrote:

> Two replies here.
> First off for normal user logins you can direct them into a cgroup - I
> looked into this about a year ago and it was actually quite easy.
> As I remember there is a service or utility available which does just
> that. Of course the user cgroup would not have
>
> Expanding on my theme, it is probably a good idea then to have all the
> system processes contained in a 'boot cpuset' - is at system boot time
> allocate a small number of cores to the system dacemons, Slurm processes
> and probably the user login sessions.
> Thus freeing up the other CPUs for batch jobs exclusively.
>
> Also you could try simply setting CUDA_VISIBLE_DEVICES to Null in one of
> the system wide login scripts,
>
>
>
>
>
>
>
> On Mon, 20 May 2019 at 08:38, Nathan Harper <nathan.har...@cfms.org.uk>
> wrote:
>
>> This doesn't directly answer your question, but in Feb last year on the
>> ML there was a discussion about limiting user resources on login node
>> (Stopping compute usage on login nodes).    Some of the suggestions
>> included the use of cgroups to do so, and it's possible that those methods
>> could be extended to limit access to GPUs, so it might be worth looking
>> into.
>>
>> On Sat, 18 May 2019 at 00:28, Dave Evans <rdev...@ece.ubc.ca> wrote:
>>
>>>
>>> We are using a single system "cluster" and want some control of fair-use
>>> with the GPUs. The sers are not supposed to be able to use the GPUs until
>>> they have allocated the resources through slurm. We have no head node, so
>>> slurmctld, slurmdbd, and slurmd are all run on the same system.
>>>
>>> I have a configuration working now such that the GPUs can be scheduled
>>> and allocated.
>>> However logging into the system before allocating GPUs gives full access
>>> to all of them.
>>>
>>> I would like to configure slurm cgroups to disable access to GPUs until
>>> they have been allocated.
>>>
>>> On first login, I get:
>>> nvidia-smi -q | grep UUID
>>>     GPU UUID                        :
>>> GPU-6076ce0a-bc03-a53c-6616-0fc727801c27
>>>     GPU UUID                        :
>>> GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3
>>>     GPU UUID                        :
>>> GPU-176d0514-0cf0-df71-e298-72d15f6dcd7f
>>>     GPU UUID                        :
>>> GPU-af03c80f-6834-cb8c-3133-2f645975f330
>>>     GPU UUID                        :
>>> GPU-ef10d039-a432-1ac1-84cf-3bb79561c0d3
>>>     GPU UUID                        :
>>> GPU-38168510-c356-33c9-7189-4e74b5a1d333
>>>     GPU UUID                        :
>>> GPU-3428f78d-ae91-9a74-bcd6-8e301c108156
>>>     GPU UUID                        :
>>> GPU-c0a831c0-78d6-44ec-30dd-9ef5874059a5
>>>
>>>
>>> And running from the queue:
>>> srun -N 1 --gres=gpu:2 nvidia-smi -q | grep UUID
>>>     GPU UUID                        :
>>> GPU-6076ce0a-bc03-a53c-6616-0fc727801c27
>>>     GPU UUID                        :
>>> GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3
>>>
>>>
>>> Pastes of my config files are:
>>> ## slurm.conf ##
>>> https://pastebin.com/UxP67cA8
>>>
>>>
>>> *## cgroup.conf ##*
>>> CgroupAutomount=yes
>>> CgroupReleaseAgentDir="/etc/slurm/cgroup"
>>>
>>> ConstrainCores=yes
>>> ConstrainDevices=yes
>>> ConstrainRAMSpace=yes
>>> #TaskAffinity=yes
>>>
>>> *## cgroup_allowed_devices_file.conf ## *
>>> /dev/null
>>> /dev/urandom
>>> /dev/zero
>>> /dev/sda*
>>> /dev/cpu/*/*
>>> /dev/pts/*
>>> /dev/nvidia*
>>>
>>
>>
>> --
>> *Nathan Harper* // IT Systems Lead
>>
>> *e: *nathan.har...@cfms.org.uk   *t*: 0117 906 1104  *m*:  0787 551 0891
>>  *w: *www.cfms.org.uk
>> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // 
>> Emersons
>> Green // Bristol // BS16 7FR
>>
>> CFMS Services Ltd is registered in England and Wales No 05742022 - a
>> subsidiary of CFMS Ltd
>> CFMS Services Ltd registered office // 43 Queens Square // Bristol //
>> BS1 4QP
>>
>

Re: [slurm-users] Using cgroups to hide GPUs on a shared controller/node

Reply via email to