Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-21 Thread Daniel Letai
Hi Peter, On 3/20/19 11:19 AM, Peter Steinbach wrote: [root@ernie /]# scontrol show node -dd g1 NodeName=g1 CoresPerSocket=4    CPUAlloc=3 CPUTot=4 CPULoad=N/A    AvailableFeatures=(null)    ActiveFeat

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-21 Thread Peter Steinbach
After more tests, the situation clears a bit. If "COREs=0,1" (etc) is present in the `gres.conf` file, then one can inject gres jobs on a single core only by using `--gres-flags=disable-bindung` if a non-gres job is running the same node. If "COREs=0,1" is NOT present in `gres.conf`. then any

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Christopher Samuel
On 3/20/19 9:09 AM, Peter Steinbach wrote: Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf file, everything stops working again. :/ Should I send around scontrol outputs? And yes, I watched out to set the --mem flag for the job submission this time. Well there you've sa

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Interesting enough, if I add Cores=0-1 and Cores=2-3 to the gres.conf file, everything stops working again. :/ Should I send around scontrol outputs? And yes, I watched out to set the --mem flag for the job submission this time. Best, Peter smime.p7s Description: S/MIME Cryptographic Signat

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Hi Philippe, thanks for spotting this. This indeed appears to solve this first issue. Now I can try to make the GPUs available and play with pinning etc. Superb - if you happen to be at ISC, let me know. I'd buy you a coffee/beer! ;) Peter smime.p7s Description: S/MIME Cryptographic Sign

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Philippe Dos Santos
SelectTypeParameters= CR_Core_Memory Therefore, there's no more memory available for incoming jobs (jobs 15 and 16). Regards, Philippe DS - Mail original - De: "Peter Steinbach" À: slurm-users@lists.schedmd.com Envoyé: Mercredi 20 Mars 2019 10:19:07 Objet: Re: [slur

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-20 Thread Peter Steinbach
Hi Chris, I changed the initial state a bit (the number of cores per node was misconfigured): https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf But that doesn't change things. Initially, I see this: # sinfo -N -l Wed Mar 20 09:03:26 2019 NODELIST NO

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Christopher Samuel
On 3/19/19 5:31 AM, Peter Steinbach wrote: For example, let's say I have a 4-core GPU node called gpu1. A non-GPU job $ sbatch --wrap="sleep 10 && hostname" -c 3 Can you share the output for "scontrol show job [that job id]" once you submit this please? Also please share "scontrol show node

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
Hi Benson, As you can perhaps see from our slurm.conf, we have task affinity or similar switches off. Along the same route, i also removed the core binding of the GPUs. That is why, I am quite surprised, that slurm doesn’t allow new jobs in. I am aware of the PCIe bandwidth implications of a GP

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Benson Muite
Hi, Many MPI implementations will have some sort of core binding allocation policy - which may impact such node sharing. Would these only be limited to single CPU jobs? Can users request a particular core, for example for a GPU based job some cores will have better memory transfer rates to the

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
I've read through the parameters. I am not sure if any of those would help in our situation. What suggestions would you make? Note, it's not the scheduler policy that appears to hinder us. It's about how slurm keeps track of the generic resource and (potentially) binds it to available cores. Th

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
Dear Eli, thanks for your reply. The slurm.conf file I suggested lists this parameter. We use SelectType=select/cons_res SelectTypeParameters=CR_Core_Memory See also: https://github.com/psteinb/docker-centos7-slurm/blob/18.08.5-with-gres/slurm.conf#L60 I'll check if that makes a difference.

Re: [slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Eli V
On Tue, Mar 19, 2019 at 8:34 AM Peter Steinbach wrote: > > Hi, > > we are struggling with a slurm 18.08.5 installation of ours. We are in a > situation, where our GPU nodes have a considerable number of cores but > "only" 2 GPUs inside. While people run jobs using the GPUs, non-GPU jobs > can ente

[slurm-users] Sharing a node with non-gres and gres jobs

2019-03-19 Thread Peter Steinbach
Hi, we are struggling with a slurm 18.08.5 installation of ours. We are in a situation, where our GPU nodes have a considerable number of cores but "only" 2 GPUs inside. While people run jobs using the GPUs, non-GPU jobs can enter alright. However, we found out the hard way, that the inverse