Hi there Slurm-experts!
I am  trouble using or running a python-mpi program involving more than one 
node. The pythom-mpi program is very simple, it only lists the number of ranks 
that is available in its environment. I have a munge-daemon running prior to 
starting the slurm-service and the program works when using a single node (so I 
suppose munge is working).
In addition, I have tested to run a simple sbatch-script where each available 
node (four nodes) states its hostname and returns.
Since authentication with Slurm is used via munge, do I need a passwordless SSH 
communication between the slurmctl and the nodes? (I found a guide,probably 
outdated stating that passwordless SSH communication is a neccessity for slurm, 
HTTP://admin-magazine.com/HPC/Articles/Resource-Management-with-Slurm).

I run the python-mpi program via a sbatch-script,invoking a srun-command. Each 
node has 8 CPUs.
The srun-command is :
”srun -N2 -n8 python3 python-mpi.py” ,
when tested on two nodes.
It works fine running on a single node(with ”-N1” instead of ”-N2”), but it is 
aborted or stopped when running on two nodes.
Should I have ”-n16” when running on two nodes? (In order to allocate the 
complete number of CPUs available of the two nodes.)
Slurm is configured and built with pmix.
I am running Slurm 19.05 on Ubuntu 18.04 as server and the nodes are running 
same slurm-version on Ubuntu 18.10.

Best rehards,

Palle

Reply via email to