Re: [slurm-users] Job step aborted

2018-05-17 Thread Mahmood Naderan
OK I understand that. However, there is a issue with ntasks=1. Assume a user wants to launch an application with the number of cores in the command line argument. Taking into mind that the cpu limit for the partition is 20 cores, the following example [mahmood@rocks7 ~]$ srun --x11 -A y8 -p RUBY -

Re: [slurm-users] PMIX and slurm failure (and fix).

2018-05-17 Thread Artem Polyakov
Thank you Bill. Can you provide anonymized Slurm.conf (I mainly interested in auth setting), srun launch error and config.log where you saw libssl mention. Being PMIx plugin developer I’m not aware about any explicit dependencies from libssl in Slurm. Only thing I can think of would be authentica

Re: [slurm-users] Creating custom partition GRES

2018-05-17 Thread Sébastien VIGNERON
Hello almon, Did you look the NodeName/Feature list functionality with sbatch —constraints before choosing GRES? Best regards, Sebastien VIGNERON > Le 18 mai 2018 à 00:02, Almon Gem Otanes a écrit : > > Hi everyone, > Is there a way to add GRES/features/attributes(not sure which is the correc

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-17 Thread Sean Caron
Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will give that a shot and see what happens. Best, Sean On Thu, May 17, 2018 at 4:49 PM, Matthieu Hautreux < matthieu.hautr...@gmail.com> wrote: > Hi, > > Communications in Slurm are not only performed from controller to slurmd

[slurm-users] Creating custom partition GRES

2018-05-17 Thread Almon Gem Otanes
Hi everyone, Is there a way to add GRES/features/attributes(not sure which is the correct term) to partitions? I'm trying to port from SGE to SLURM. Our current setup have queues(partitions) that refer to physical systems. The binaries we want to execute are just scripts that don't need to run on t

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-17 Thread Matthieu Hautreux
Hi, Communications in Slurm are not only performed from controller to slurmd and from slurmd to controller. You need to ensure that your login nodes can reach the controller and the slurmd nodes as well as ensure that slurmd on the various nodes can contact each other. This last requirement is bec

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-17 Thread Sean Caron
Sorry, how do you mean? The environment is very basic. Compute nodes and SLURM controller are on an RFC1918 subnet. Gateways are dual homed with one leg on a public IP and one leg on the RFC1918 cluster network. It used to be that nodes that only had a leg on the RFC1918 network (compute nodes and

Re: [slurm-users] Job step aborted

2018-05-17 Thread Matthieu Hautreux
Le jeu. 17 mai 2018 11:28, Mahmood Naderan a écrit : > Hi, > For an interactive job via srun, I see that after opening the gui, the > session is terminated automatically which is weird. > > [mahmood@rocks7 ansys_test]$ srun --x11 -A y8 -p RUBY --ntasks=10 > --mem=8GB --pty bash > [mahmood@compute

Re: [slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

2018-05-17 Thread Patrick Goetz
Does your SMS have a dedicated interface for node traffic? On 05/16/2018 04:00 PM, Sean Caron wrote: I see some chatter on 6818/TCP from the compute node to the SLURM controller, and from the SLURM controller to the compute node. The policy is to permit all packets inbound from SLURM controlle

Re: [slurm-users] Job step aborted

2018-05-17 Thread Mahmood Naderan
I have opened a bug ticket at https://bugs.schedmd.com/show_bug.cgi?id=5182 It is annoying... Regards, Mahmood On Thu, May 17, 2018 at 1:54 PM, Mahmood Naderan wrote: > Hi, > For an interactive job via srun, I see that after opening the gui, the > session is terminated automatically which is

[slurm-users] PMIX and slurm failure (and fix).

2018-05-17 Thread Bill Broadley
Greetings all, Just wanted to mention I build building the newest slurm on Ubuntu 18.04. Gcc-7.3 is the default compiler, which means that the various dependencies (munge, libevent, hwloc, netloc, pmix, etc) are already available and built with gcc-7.3. I carefully built slurm-17.11.6 + openmpi

[slurm-users] Job step aborted

2018-05-17 Thread Mahmood Naderan
Hi, For an interactive job via srun, I see that after opening the gui, the session is terminated automatically which is weird. [mahmood@rocks7 ansys_test]$ srun --x11 -A y8 -p RUBY --ntasks=10 --mem=8GB --pty bash [mahmood@compute-0-6 ansys_test]$ /state/partition1/scfd/sc -t10 srun: First task ex

Re: [slurm-users] X11 debug

2018-05-17 Thread Marco Ehlert
On Thu, 17 May 2018, Nadav Toledo wrote: Hello everyone, After fighting with x11 forwarding couple of weeks, I think i've got a few tips that can help others. I am using slurm 17.11.6 with builtin x11 forwarding with ubuntu server distro, all servers in cluster share /home via beegfs. slurm w

Re: [slurm-users] X11 debug

2018-05-17 Thread Ole Holm Nielsen
On 05/17/2018 08:45 AM, Nadav Toledo wrote: Hello everyone, After fighting with x11 forwarding couple of weeks, I think i've got a few tips that can help others. I am using slurm 17.11.6 with builtin x11 forwarding with ubuntu server distro, all servers in cluster share /home via beegfs. slu