[slurm-dev] Re: How to strictly limit the memory per CPU

2017-11-03 Thread Bjørn-Helge Mevik
马银萍 writes: > Thanks all for your repliy, > But what I need is to stop user to use --mem and --mem-per-cpu , but I > can't figure out this with cgroup and tres. How can I use the "submit > filter", should I modify the source code of SLURM? It sounds like this is a job for the lua job submit plug

[slurm-dev] Wrong device order in CUDA_VISIBLE_DEVICES

2017-11-03 Thread Maik Schmidt
Dear all, first, let me say that we do not use ConstrainDevice in our setup, so we have to rely on CUDA_VISIBLE_DEVICES to ensure that user applications use the correct GPU that they have allocated on our multi-GPU nodes. This seemed to work well for quite some time on our homogenous nodes, b

[slurm-dev] Query about Compute + GPUs

2017-11-03 Thread Ing. Gonzalo E. Arroyo
Hi Gents! I would need some configuration to setup a compute node that has 8 cores, so I can use 6 for normal compute tasks, then 2 cores for GPU partition. I made 2 partitions in SLURM, but both detect 8 cores per each, I would need one partition with 6 and the GPU with 2. Thanks! *Ing. Gonzalo

[slurm-dev] Re: Query about Compute + GPUs

2017-11-03 Thread Merlin Hartley
Sounds like you would need 2 different NodeName lines - one in each partition. -- Merlin Hartley Computer Officer MRC Mitochondrial Biology Unit Cambridge, CB2 0XY United Kingdom > On 3 Nov 2017, at 15:08, Ing. Gonzalo E. Arroyo > wrote: > > Hi Gents! > > I would need some configuration to s

[slurm-dev] Re: Query about Compute + GPUs

2017-11-03 Thread Ing. Gonzalo E. Arroyo
Hi Merlin! Thanks for helping. Are you sure I can put 2 lines in the nodenames.conf file with the same NodeName ?? How can the SLURM choose which line is per partition? I have this now... Auto Created NodeName=flash-10-1 NodeAddr=10.1.10.1 CPUs=4 Weight=20482899 Feature=rack-10,4CPU

[slurm-dev] PMIx at SC'17

2017-11-03 Thread r...@open-mpi.org
My apologies for the shameless promotion, but for those interested, there will be a PMIx BoF meeting this year at SC’17 on Thursday, November 16, 2017, at 12:15pm: http://sc17.supercomputing.org/presentation/?id=bof104&sess=sess308

[slurm-dev] Slurm booth presentations?

2017-11-03 Thread Bill Wichser
Is there a schedule of booth speakers at the Slurm booth yet? Other than Ralph at 10 on 15th? Thanks, Bill

[slurm-dev] Re: Wrong device order in CUDA_VISIBLE_DEVICES

2017-11-03 Thread Kilian Cavalotti
Hi Malk, On Fri, Nov 3, 2017 at 2:14 AM, Maik Schmidt wrote: > It is my understanding that when ConstrainDevices is not set to "yes", SLURM > uses the so called "Minor Number" (nvidia-smi -q | grep Minor) that is the > number in the device name (/dev/nvidia0 -> ID 0 and so on) and puts it in > t

[slurm-dev] Re: Query about Compute + GPUs

2017-11-03 Thread Merlin Hartley
They would need to have different NodeNames - but the same NodeAddr for example: NodeName=fisesta-21-3 NodeAddr=10.1.21.3 CPUs=6 Weight=20485797 Feature=rack-21,6CPUs NodeName=fisesta-21-3-gpu NodeAddr=10.1.21.3 CPUs=2 Weight=20485797 Feature=rack-21,2CPUs Gres=gpu:1 Hope this is useful! Merl

[slurm-dev] Offlining Faulty GPU?

2017-11-03 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, Does anyone have any tricks for offlining a faulty GPU that aren't terribly cumbersome? I'm aware of editing the configs and removing the GPU, etc., but I'm wondering if there's something similar or something particularly clever that makes sh

[slurm-dev] Re: Query about Compute + GPUs

2017-11-03 Thread Ing. Gonzalo E. Arroyo
I think that this will work!! Thanks for your help, but instead of adding "<...>-gpu", I added "<...>-cpus" to the 6-cpus line. In Rocks Clusters I also had to make this... rocks add host fisesta-21-3-cpus membership="compute" rack=21 rank=3 rocks set host cpus fisesta-21-3-cpus cpus=6 rocks set h