Re: [slurm-users] maxim number of pending jobs

2018-03-09 Thread Ole Holm Nielsen
On 03/08/2018 04:49 PM, Renat Yakupov wrote: Thank you, Ole. That is exactly it. And it probably answers a lot of future questions, since I know now how to see the configuration information. Good to hear! The "scontrol show config" shows many, but not all, Slurm parameters. You may have to

[slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
Hi, I am trying to get interactive jobs to work from the machine we use as a login node, i.e., where the users of the cluster log into and from where they typically submit jobs. I submit the job as follows: vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun bash -i salloc: Granted

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Pickering, Roger (NIH/NIAAA) [E]
I'm confused. Why would you want to run an interactive program using srun? Roger -Original Message- From: Andy Georges [mailto:andy.geor...@ugent.be] Sent: Friday, March 09, 2018 12:20 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Problem launching interactive jobs using

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Michael Robbert
I think that the piece you may be missing is --pty, but I also don't think that salloc is necessary. The most simple command that I typically use is: srun -N1 -n1 --pty bash -i Mike On 3/9/18 10:20 AM, Andy Georges wrote: Hi, I am trying to get interactive jobs to work from the machine we

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
Hi, Adding —pty makes no difference. I do not get a prompt and on the node the logs show an error. If —pty is used, the error is somewhat different compared to not using it but the end result is the same. My main issue is that giving the same command on the machines running slurmd and slurmctl

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Mark M
I'm having the same issue. The salloc command hangs on my login nodes, but works fine on the head node. My default salloc command is: SallocDefaultCommand="/usr/bin/srun -n1 -N1 --pty --preserve-env $SHELL" I'm on the OpenHPC slurm 17.02.9-69.2. The log says the job is assigned, then eventually

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
Hi all, Cranked up the debug level a bit Job was not started when using: vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun --pty bash -i salloc: Granted job allocation 42 salloc: Waiting for resource configuration salloc: Nodes node2801 are ready for job For comparison purposes, runn

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Nicholas McCollum
Connection refused makes me think a firewall issue. Assuming this is a test environment, could you try on the compute node: # iptables-save > iptables.bak # iptables -F && iptables -X Then test to see if it works. To restore the firewall use: # iptables-restore < iptables.bak You may have to

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Mark M
In my case I tested firewall. But I'm wondering if the login nodes need to appear in the slurm.conf, and also if slurmd needs to be running on the login nodes in order for them to be a submit host? Either or both could be my issue. On Fri, Mar 9, 2018 at 12:58 PM, Nicholas McCollum wrote: > Conn

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges
Hi, > On 9 Mar 2018, at 21:58, Nicholas McCollum wrote: > > Connection refused makes me think a firewall issue. > > Assuming this is a test environment, could you try on the compute node: > > # iptables-save > iptables.bak > # iptables -F && iptables -X > > Then test to see if it works. To

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Mark M
OK, I'm eating my words now. Perhaps I have had multiple issues before, but at the moment stopping the firewall allows salloc to work. Can anyone suggest an iptables rule specific to slurm? Or a way to restrict slurm communications to the right network? On Fri, Mar 9, 2018 at 1:10 PM, Mark M wrot

[slurm-users] OverSubscribe can be used for cpu, but not worked for GPU?

2018-03-09 Thread Chaofeng Zhang
Below is worked for cpu, with OverSubscribe, I can have more than 4 process in running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will be just 1 process in running status, the other are in pending status. The OverSubscribe can just be used for the resource cpu, whether it