[slurm-users] Re: GPU GRES verification and some really broad questions.

Loris Bennett via slurm-users Fri, 10 May 2024 01:59:22 -0700

Hi,
 
Shooktija S N via slurm-users <slurm-users@lists.schedmd.com> writes:


> Hi,
>
> I am a complete slurm-admin and sys-admin noob trying to set up a 3 node 
> Slurm cluster. I have managed to get a minimum working example running, in
> which I am able to use a GPU (NVIDIA GeForce RTX 4070 ti) as a GRES. 
>
> This is slurm.conf without the comment lines:
> root@server1:/etc/slurm# grep -v "#" slurm.conf
> ClusterName=DlabCluster
> SlurmctldHost=server1
> GresTypes=gpu
> ProctrackType=proctrack/linuxproc
> ReturnToService=1
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/spool/slurmd
> SlurmUser=root
> StateSaveLocation=/var/spool/slurmctld
> TaskPlugin=task/affinity,task/cgroup
> InactiveLimit=0
> KillWait=30
> MinJobAge=300
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> SchedulerType=sched/backfill
> SelectType=select/cons_tres
> JobCompType=jobcomp/none
> JobAcctGatherFrequency=30
> SlurmctldDebug=info
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=debug3
> SlurmdLogFile=/var/log/slurmd.log
> NodeName=server[1-3] RealMemory=128636 Sockets=1 CoresPerSocket=64 
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:1
> PartitionName=mainPartition Nodes=ALL Default=YES MaxTime=INFINITE State=UP
> This is gres.conf (only one line), each node has been assigned its 
> corresponding NodeName:
> root@server1:/etc/slurm# cat gres.conf
> NodeName=server1 Name=gpu File=/dev/nvidia0
> Those are the only config files I have.
>
> I have a few general questions, loosely arranged in ascending order of 
> generality:
>
> 1) I have enabled the allocation of GPU resources as a GRES and have tested 
> this by running:
> shookti@server1:~$ srun --nodes=3 --gpus=3 --label hostname
> 2: server3
> 0: server1
> 1: server2
> Is this a good way to check if the configs have worked correctly? How else 
> can I easily check if the GPU GRES has been properly configured?

What do you mean by 'properly configured'?  Ultimately you will want to
submit a job to the nodes and use something like 'nvidia-smi' to see
whether the GPUs are actually being used.

> 2) I want to reserve a few CPU cores, and a few gigs of memory for use by non 
> slurm related tasks. According to the documentation, I am to use
> CoreSpecCount and MemSpecLimit to achieve this. The documentation for 
> CoreSpecCount says "the Slurm daemon slurmd may either be confined to these
> resources (the default) or prevented from using these resources", how do I 
> change this default behaviour to have the config specify the cores reserved 
> for non
> slurm stuff instead of specifying how many cores slurm can use?

I am not aware that this is possible.

> 3) While looking up examples online on how to run Python scripts inside a 
> conda env, I have seen that the line 'module load conda' should be run before
> running 'conda activate myEnv' in the sbatch submission script. The command 
> 'module' did not exist until I installed the apt package 
> 'environment-modules',
> but now I see that conda is not listed as a module that can be loaded when I 
> check using the command 'module avail'. How do I fix this?

Environment modules and Conda are somewhat orthogonal to each other.

Environment modules is a mechanism for manipulating environment
variables such as PATH and LD_LIBRARY_PATH.  It allows you to provide
easy access for all users to software which has been centrally installed
in non-standard paths.  It is not used to provide access to software
installed via 'apt'.

Conda is another approach to providing non-standard software, but is
usually used by individual users to install programs in their own home
directories.

You can use environment modules to allow access to a different version
of Conda than the one you get via 'apt', but there is no necessity to do
that. 

> 4) A very broad question: while managing the resources being used by a 
> program, slurm might happen to split the resources across multiple computers 
> that
> might not necessarily have the files required by this program to run. For 
> example, a python script that requires the package 'numpy' to function but 
> that
> package was not installed on all of the computers. How are such things dealt 
> with? Is the module approach meant to fix this problem? In my previous
> question, if I had a python script that users usually run just by running a 
> command like 'python3 someScript.py' instead of running it within a conda
> environment, how should I enable slurm to manage the resources required by 
> this script? Would I have to install all the packages required by this script 
> on all
> the computers that are in the cluster?

In general a distributed or cluster file system, such as NFS, Ceph or
Lustre is used to provide access to multiple nodes.  /home would be on
such a files system, as would a large part of the software.  You can
use something like EasyBuild which will install software and generate
the relevant module files.

> 5) Related to the previous question: I have set up my 3 nodes in such a way 
> that all the users' home directories are stored on a ceph cluster created 
> using the
> hard drives from all the 3 nodes, which essentially means that a user's home 
> directory is mounted at the same location on all 3 computers - making a user's
> data visible to all 3 nodes. Does this make the process of managing the 
> dependencies of a program as described in the previous question easier? I 
> realise that
> programs having to read and write to files on the hard drives of a ceph 
> cluster is not really the fastest so I am planning on having users use the 
> /tmp/ directory
> for speed critical reading and writing, as the OSs have been installed
> on NVME drives.

Depending on the IO patterns created by a piece of software using the
distributed file system might be fine or a local disk might be needed.
Note that you might experience problems with /tmp filling up, so it may
be better to have a separate /localscratch.  In general you probably also
want people to use as much RAM as possible in order to avoid filesystem
IO altogether if this is feasible.

HTH

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
FUB-IT (ex-ZEDAT), Freie Universität Berlin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: GPU GRES verification and some really broad questions.

Reply via email to