Robert,

There is a pretty good consensus here that the RPM that Bright is providing do 
not support NVML.
If you need this function and you do not want to attempt building your own RPM 
on a node with the Nvidia drivers installed, have you considered contacting the 
Bright support? This would be the best route since it is clearly not a Slurm 
issue, but a build configuration issue.

--
Davide Vanzo, PhD
Computer Scientist
BioHPC – Lyda Hill Dept. of Bioinformatics
UT Southwestern Medical Center

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Robert 
Kudyba
Sent: Wednesday, April 8, 2020 2:17 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Header lengths are longer than data received after 
changing SelectType & GresTypes to use MPS

EXTERNAL MAIL
> and the NVIDIA Management Library (NVML) is installed on the node and
> was found during Slurm configuration

That's the key phrase - when whoever compiled Slurm ran ./configure
*before* compilation it was on a system without the nvidia libraries and
headers present, so Slurm could not compile that support in.

You'll need to redo the build on a system with the nvidia libraries and
headers in order for this to work.

As I wrote we use Bright Cluster on CentOS 7.7. So we just follow their 
instructions<https://support.brightcomputing.com/manuals/8.2/admin-manual.pdf#subsection.7.5.1>
 to use yum install slurm20, here they show Slurm 19 but it's the same for 20:
Example
[root@bright82 ~]# rpm -qa | grep slurm | xargs -p rpm -e
[root@bright82 ~]# rpm -qa -r /cm/images/default-image |grep slurm |xargs -p 
rpm -r /cm/images/default-image -e
[root@bright82 ~]# yum install slurm19-client slurm19-slurmdbd slurm19-perlapi 
slurm19-contribs slurm19
[root@bright82 ~]# yum install --installroot=/cm/images/default-image 
slurm19-client
If either slurm or slurm19 is installed, then the administrator can run 
wlm-setup using the workload manager name slurm—that is without the 19 
suffix–to set up Slurm. The roles at node level, or category level—slurmserver 
and slurmclient—work with either Slurm version.
Configuring Slurm
After package setup is done with wlm-setup (section 7.3), Slurm software 
components are installed in /cm/shared/apps/slurm/current.
Slurm clients and servers can be configured to some extent via role assignment 
(sections 7.4.1 and 7.4.2). Using cmsh, advanced option parameters can be set 
under the slurmclient role:
For example, the number of cores per socket can be set:
Example
[bright82->category[default]->roles[slurmclient]]% set corespersocket 2
[bright82->category*[default*]->roles*[slurmclient*]]% commit
In order to configure generic resources, the genericresources mode can be used 
to set a list of objects. Each object then represents one generic resource 
available on nodes. Each value of name in genericresources must already be 
defined in the list of GresTypes. The list of GresTypes is defined in the 
slurmserver role. Several generic resources entries can have the same value for 
name (for example gpu), but must have a unique alias. The alias is a string 
that is used to manage the resource entry in cmsh or in Bright View. The string 
is enclosed in square brackets in cmsh, and is used instead of the name for the 
object. The alias does not affect Slurm configuration.

For example, to add two GPUs for all the nodes in the default category which 
are of type k20xm, and to assign them to different CPU cores, the following 
cmsh commands can be run:
Example
[bright82]% category use default
[bright82->category[default]]% roles
[bright82->category[default]->roles]% use slurmclient
[...[slurmclient]]% genericresources
[...[slurmclient]->genericresources]% add gpu0
[...[slurmclient*]->genericresources*[gpu0*]]% set name gpu
[...[slurmclient*]->genericresources*[gpu0*]]% set file /dev/nvidia0
[...[slurmclient*]->genericresources*[gpu0*]]% set cores 0-7
[...[slurmclient*]->genericresources*[gpu0*]]% set type k20xm
[...[slurmclient*]->genericresources*[gpu0*]]% add gpu1
[...[slurmclient*]->genericresources*[gpu1*]]% set name gpu
[...[slurmclient*]->genericresources*[gpu1*]]% set file /dev/nvidia1

CAUTION: This email originated from outside UTSW. Please be cautious of links 
or attachments, and validate the sender's email address before replying.

________________________________

UT Southwestern


Medical Center



The future of medicine, today.

Reply via email to