MY apology. You do say that the Python program simply printe the rank - so is a hello world program.
On Fri, 12 Jul 2019 at 07:45, John Hearns <hear...@googlemail.com> wrote: > Please try something very simple such as a hello world program or > srun -N2 -n8 hostname > > What is the error message which you have ? > > On Fri, 12 Jul 2019 at 07:07, Pär Lundö <par.lu...@foi.se> wrote: > >> >> Hi there Slurm-experts! >> I am trouble using or running a python-mpi program involving more than >> one node. The pythom-mpi program is very simple, it only lists the number >> of ranks that is available in its environment. I have a munge-daemon >> running prior to starting the slurm-service and the program works when >> using a single node (so I suppose munge is working). >> In addition, I have tested to run a simple sbatch-script where each >> available node (four nodes) states its hostname and returns. >> Since authentication with Slurm is used via munge, do I need a >> passwordless SSH communication between the slurmctl and the nodes? (I found >> a guide,probably outdated stating that passwordless SSH communication is a >> neccessity for slurm, >> HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm). >> >> I run the python-mpi program via a sbatch-script,invoking a srun-command. >> Each node has 8 CPUs. >> The srun-command is : >> ”srun -N2 -n8 python3 python-mpi.py” , >> when tested on two nodes. >> It works fine running on a single node(with ”-N1” instead of ”-N2”), but >> it is aborted or stopped when running on two nodes. >> Should I have ”-n16” when running on two nodes? (In order to allocate the >> complete number of CPUs available of the two nodes.) >> Slurm is configured and built with pmix. >> I am running Slurm 19.05 on Ubuntu 18.04 as server and the nodes are >> running same slurm-version on Ubuntu 18.10. >> >> Best rehards, >> >> Palle >> >