Please try something very simple such as a hello world program or srun -N2 -n8 hostname
What is the error message which you have ? On Fri, 12 Jul 2019 at 07:07, Pär Lundö <par.lu...@foi.se> wrote: > > Hi there Slurm-experts! > I am trouble using or running a python-mpi program involving more than > one node. The pythom-mpi program is very simple, it only lists the number > of ranks that is available in its environment. I have a munge-daemon > running prior to starting the slurm-service and the program works when > using a single node (so I suppose munge is working). > In addition, I have tested to run a simple sbatch-script where each > available node (four nodes) states its hostname and returns. > Since authentication with Slurm is used via munge, do I need a > passwordless SSH communication between the slurmctl and the nodes? (I found > a guide,probably outdated stating that passwordless SSH communication is a > neccessity for slurm, > HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm). > > I run the python-mpi program via a sbatch-script,invoking a srun-command. > Each node has 8 CPUs. > The srun-command is : > ”srun -N2 -n8 python3 python-mpi.py” , > when tested on two nodes. > It works fine running on a single node(with ”-N1” instead of ”-N2”), but > it is aborted or stopped when running on two nodes. > Should I have ”-n16” when running on two nodes? (In order to allocate the > complete number of CPUs available of the two nodes.) > Slurm is configured and built with pmix. > I am running Slurm 19.05 on Ubuntu 18.04 as server and the nodes are > running same slurm-version on Ubuntu 18.10. > > Best rehards, > > Palle >