[slurm-users] Free Gres resources

2018-02-12 Thread Nadav Toledo
Hello everyone, Does anyone know of way to get amount of idle gpu per node or for all cluster ? sinfo -o %G gives the total amount of gres resource for each node. Is there a way to get the idle amount same as you can get for cpu (%C)? Perhaps if one use

Re: [slurm-users] Too many single-stream jobs?

2018-02-12 Thread Andy Riebs
Many thanks Matthieu! Andy On 02/12/2018 06:42 PM, Matthieu Hautreux wrote: Hi, your login node may have a heavy load while starting such a large number of independant sruns. This may induce issues not seen under normal load, like partial read/write on sockets, triggering bugs in slurm, f

Re: [slurm-users] Too many single-stream jobs?

2018-02-12 Thread Matthieu Hautreux
Hi, your login node may have a heavy load while starting such a large number of independant sruns. This may induce issues not seen under normal load, like partial read/write on sockets, triggering bugs in slurm, for functions not properly protected against such events. Quickly looking at the sou

[slurm-users] Too many single-stream jobs?

2018-02-12 Thread Andy Riebs
We have a user who wants to run multiple instances of a single process job across a cluster, using a loop like - for N in $nodelist; do srun -w $N program & done wait - This works up to a thousand nodes or so (jobs are allocated by node here), but as the number of jobs submitted i

Re: [slurm-users] Should I join the federation?

2018-02-12 Thread Vicker, Darby (JSC-EG311)
We recently brought a new cluster online with the desire to federate it with our existing cluster. See the full story here: https://bugs.schedmd.com/show_bug.cgi?id=4512 There are some fairly large limitations to federation, the biggest of which (for us anyway) was: > The current implementati

Re: [slurm-users] Set priority without sudo and without a database ?

2018-02-12 Thread Magnus Jonsson
On 2018-02-12 11:37, Fabien ELOY wrote: Hello, I am trying to set priority ... but it doesn't work ! If I type sudo srun --priority=X, it's OK. But if I use my "standard" user it's not OK (priority calculated by slurm). I do not have a database used with SLURM. Il my slurm.conf, "SlurmUs

Re: [slurm-users] Set priority without sudo and without a database ?

2018-02-12 Thread Loris Bennett
Fabien ELOY writes: >> 2018-02-12 11:51 GMT+01:00 Loris Bennett : >> >> Hi Fabien, >> >> Fabien ELOY writes: >> >> > Hello, >> > >> > I am trying to set priority ... but it doesn't work ! >> > >> > If I type sudo srun --priority=X, it's OK. But if I use my "standard" >> user it's not OK

Re: [slurm-users] Set priority without sudo and without a database ?

2018-02-12 Thread Fabien ELOY
Hi Loris, Thank you for your reply. SLURM jobs are submitted by a JAVA application and there is only one SLURM user. Should we use another plugin (not multifactor plugin) ? Is it a way to fix user rights ? Below my slurm.conf ("anonymized") : SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid Sl

Re: [slurm-users] Set priority without sudo and without a database ?

2018-02-12 Thread Loris Bennett
Hi Fabien, Fabien ELOY writes: > Hello, > > I am trying to set priority ... but it doesn't work ! > > If I type sudo srun --priority=X, it's OK. But if I use my "standard" user > it's not OK (priority calculated by slurm). > > I do not have a database used with SLURM. > > Il my slurm.conf, "Slu

[slurm-users] Set priority without sudo and without a database ?

2018-02-12 Thread Fabien ELOY
Hello, I am trying to set priority ... but it doesn't work ! If I type sudo srun --priority=X, it's OK. But if I use my "standard" user it's not OK (priority calculated by slurm). I do not have a database used with SLURM. Il my slurm.conf, "SlurmUser=slurm" and my server has 2 users in the sam

[slurm-users] Should I join the federation?

2018-02-12 Thread Yair Yarom
Hi all, I was wondering if any of you can share your insights regarding federations. What unexpected caveats have you encountered? We have here about about 15 "small" clusters (due to political and technical reasons), and most users have access to more than one cluster. Federation seems like a g