Sorry Dave, nothing handy. However look at this writeup from You Know Who: https://pbspro.atlassian.net/wiki/spaces/PD/pages/11599882/PP-325+Support+Cgroups Look at the devices: Subsystem
You will need the major device number for the Nvidia devices, for example on my system: crw-rw-rw- 1 root root 195, 0 Mar 1 12:16 /dev/nvidia0 crw-rw-rw- 1 root root 195, 1 Mar 1 12:16 /dev/nvidia1 crw-rw-rw- 1 root root 195, 255 Mar 1 12:16 /dev/nvidiactl Looking in /sys/fs/cgroup/devices there are two files: --w------- 1 root root 0 May 21 12:28 devices.allow --w------- 1 root root 0 May 21 12:28 devices.deny These have the rather interesting properly of being write-only ... So for a particular cgroup: echo "c 195:* rw" > devices.deny should deny access to character devices with major number 195 https://docs.oracle.com/cd/E37670_01/E41138/html/ol_devices_cgroups.html On Tue, 21 May 2019 at 01:28, Dave Evans <rdev...@ece.ubc.ca> wrote: > Do you have that resource handy? I looked into the cgroups documentation > but I see very little on tutorials for modifying the permissions. > > On Mon, May 20, 2019 at 2:45 AM John Hearns <hear...@googlemail.com> > wrote: > >> Two replies here. >> First off for normal user logins you can direct them into a cgroup - I >> looked into this about a year ago and it was actually quite easy. >> As I remember there is a service or utility available which does just >> that. Of course the user cgroup would not have >> >> Expanding on my theme, it is probably a good idea then to have all the >> system processes contained in a 'boot cpuset' - is at system boot time >> allocate a small number of cores to the system dacemons, Slurm processes >> and probably the user login sessions. >> Thus freeing up the other CPUs for batch jobs exclusively. >> >> Also you could try simply setting CUDA_VISIBLE_DEVICES to Null in one of >> the system wide login scripts, >> >> >> >> >> >> >> >> On Mon, 20 May 2019 at 08:38, Nathan Harper <nathan.har...@cfms.org.uk> >> wrote: >> >>> This doesn't directly answer your question, but in Feb last year on the >>> ML there was a discussion about limiting user resources on login node >>> (Stopping compute usage on login nodes). Some of the suggestions >>> included the use of cgroups to do so, and it's possible that those methods >>> could be extended to limit access to GPUs, so it might be worth looking >>> into. >>> >>> On Sat, 18 May 2019 at 00:28, Dave Evans <rdev...@ece.ubc.ca> wrote: >>> >>>> >>>> We are using a single system "cluster" and want some control of >>>> fair-use with the GPUs. The sers are not supposed to be able to use the >>>> GPUs until they have allocated the resources through slurm. We have no head >>>> node, so slurmctld, slurmdbd, and slurmd are all run on the same system. >>>> >>>> I have a configuration working now such that the GPUs can be scheduled >>>> and allocated. >>>> However logging into the system before allocating GPUs gives full >>>> access to all of them. >>>> >>>> I would like to configure slurm cgroups to disable access to GPUs until >>>> they have been allocated. >>>> >>>> On first login, I get: >>>> nvidia-smi -q | grep UUID >>>> GPU UUID : >>>> GPU-6076ce0a-bc03-a53c-6616-0fc727801c27 >>>> GPU UUID : >>>> GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3 >>>> GPU UUID : >>>> GPU-176d0514-0cf0-df71-e298-72d15f6dcd7f >>>> GPU UUID : >>>> GPU-af03c80f-6834-cb8c-3133-2f645975f330 >>>> GPU UUID : >>>> GPU-ef10d039-a432-1ac1-84cf-3bb79561c0d3 >>>> GPU UUID : >>>> GPU-38168510-c356-33c9-7189-4e74b5a1d333 >>>> GPU UUID : >>>> GPU-3428f78d-ae91-9a74-bcd6-8e301c108156 >>>> GPU UUID : >>>> GPU-c0a831c0-78d6-44ec-30dd-9ef5874059a5 >>>> >>>> >>>> And running from the queue: >>>> srun -N 1 --gres=gpu:2 nvidia-smi -q | grep UUID >>>> GPU UUID : >>>> GPU-6076ce0a-bc03-a53c-6616-0fc727801c27 >>>> GPU UUID : >>>> GPU-5620ec48-7d76-0398-9cc1-f1fa661274f3 >>>> >>>> >>>> Pastes of my config files are: >>>> ## slurm.conf ## >>>> https://pastebin.com/UxP67cA8 >>>> >>>> >>>> *## cgroup.conf ##* >>>> CgroupAutomount=yes >>>> CgroupReleaseAgentDir="/etc/slurm/cgroup" >>>> >>>> ConstrainCores=yes >>>> ConstrainDevices=yes >>>> ConstrainRAMSpace=yes >>>> #TaskAffinity=yes >>>> >>>> *## cgroup_allowed_devices_file.conf ## * >>>> /dev/null >>>> /dev/urandom >>>> /dev/zero >>>> /dev/sda* >>>> /dev/cpu/*/* >>>> /dev/pts/* >>>> /dev/nvidia* >>>> >>> >>> >>> -- >>> *Nathan Harper* // IT Systems Lead >>> >>> *e: *nathan.har...@cfms.org.uk *t*: 0117 906 1104 *m*: 0787 551 >>> 0891 *w: *www.cfms.org.uk >>> CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // >>> Emersons >>> Green // Bristol // BS16 7FR >>> >>> CFMS Services Ltd is registered in England and Wales No 05742022 - a >>> subsidiary of CFMS Ltd >>> CFMS Services Ltd registered office // 43 Queens Square // Bristol // >>> BS1 4QP >>> >>