Dear all,

we have the same problem on RHEL 7.7 and Slurm 19.05.5.
Can anybody of you help us to find a solution for that problem?

We now are using the parameter "SelectType=select/cons_res", do we may need the parameter "SelectType=select/cons_tres" instead?

Kind regards,
Danny Rotscher

Am 27.11.19 um 07:47 schrieb Uemoto, Tomoki:
Hi, all

OS Version: RHEL 7.6
SLURM Version: slurm 18.08.6

I defined the gpu resource as follows:

   [test@ohpc137pbsop-c001 ~]$ scontrol show config |grep TaskPlugin
   TaskPlugin              = task/cgroup
   TaskPluginParam         = (null type)
   [test@ohpc137pbsop-c001 ~]$
[test@ohpc137pbsop-c001 ~]$ grep Gres /etc/slurm/slurm.conf
   GresTypes=gpu
   NodeName=ohpc137pbsop-c001 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 
Gres=gpu:2 State=IDLE
   NodeName=ohpc137pbsop-c002 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 
Gres=gpu:2 State=IDLE
   [test@ohpc137pbsop-c001 ~]$

   [test@ohpc137pbsop-c001 ~]$ cat /etc/slurm/gres.conf
   Name=gpu File=/dev/tty0 Cores=0,1
   Name=gpu File=/dev/tty1 Cores=0,1
[test@ohpc137pbsop-c001 ~]$

 [root@ohpc137pbsop-sms ~]# cat /etc/slurm/cgroup.conf
   ###
   #
   # Slurm cgroup support configuration file
   #
   # See man slurm.conf and man cgroup.conf for further
   # information on cgroup configuration parameters
   #--
   ConstrainCores=yes
   TaskAffinity=yes
   CgroupMountpoint=/cgroup
   CgroupAutomount=yes
   ConstrainRAMSpace=yes
   [root@ohpc137pbsop-sms ~]#
[root@ohpc137pbsop-sms ~]# scontrol show node |grep Gres
    Gres=gpu:2
    Gres=gpu:2
   [root@ohpc137pbsop-sms ~]#

And I executed the following script.

   [test@ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v,g -l 
hostname
   0: ohpc137pbsop-c001
   2: ohpc137pbsop-c002
   1: ohpc137pbsop-c001
   3: ohpc137pbsop-c002
   [test@ohpc137pbsop-sms ~]$ srun -l --gres=gpu:2 -n4 --accel-bind=v -l 
hostname
   2: ohpc137pbsop-c002
   0: ohpc137pbsop-c001
   3: ohpc137pbsop-c002
   1: ohpc137pbsop-c001
   [test@ohpc137pbsop-sms ~]$

   Task binding information is not output.
   Is the verbose mode (of the accel-bind) not supported in this version(slurm 
18.08.6)?

   The verbose mode of cpu-bind was confirmed as follows.
   [test@ohpc137pbsop-sms ~]$ srun -c1 --cpu-bind=v hostname
   cpu-bind=NULL - ohpc137pbsop-c001, task  0  0 [22822]: mask 0x1000001
   ohpc137pbsop-c001
   cpu-bind=NULL - ohpc137pbsop-c001, task  1  1 [22823]: mask 0x1000001
   ohpc137pbsop-c001
   [test@ohpc137pbsop-sms ~]$

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Danny Rotscher
HPC-Support

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
01062 Dresden
Tel.: +49 351 463-35853
Fax : +49 351 463-37773
E-Mail: danny.rotsc...@tu-dresden.de
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to