Hi;
Your mpi and NAMD use your second network because of your applications
did not compiled for infiniband. There are many compiled NAMD versions.
the verb and ibverb versions are for using infiniband. Also, when you
compiling the mpi source, you should check configure script detect the
infiniband network to use infiniband. And even while compiling the slurm
too.
Regards;
Ahmet M.
On 5.12.2019 15:07, sysadmin.caos wrote:
Hello,
Really, I don't know if my question is for this mailing list... but I
will explain my problem and, then, you could answer me whatever you
think ;)
I manage a SLURM clusters composed by 3 networks:
* a gigabit network used for NFS shares (192.168.11.X). In this
network, my nodes are "node01, node02..." in /etc/hosts.
* a gigabit network used by SLURM (all my nodes are added to SLURM
cluster using this network and the hostname assigned via /etc/host
to this second network). (192.168.12.X). In this network, my nodes
are "clus01, clus02..." in /etc/hosts.
* a Infiniband network (192.168.13.X). In this network, my nodes are
"infi01, infi02..." in /etc/hosts.
When I submit a MPI job, SLURM scheduler offers me "n" nodes called,
for example, clus01 and clus02 and, there, my application runs
perfectly using second network for SLURM connectivity and first
network for NFS (and NIS) shares. By default, as SLURM connectivity is
on second network, my nodelist contains nodes called "clus0x".
However, now, I'm getting a "new" problem. I want to use third network
(Infiniband), but as SLURM offers me "clus0x" (second network), my MPI
application runs OK but using second network. This problem also
occurs, for example, using NAMD (Charmrun) application.
So, my questions are:
1. is this SLURM configuration correct for using both networks?
1. If answer is "no", how do I configure SLURM for my purpose?
2. But if answer is "yes", how can I ensure connections in my
SLURM job are going in Infiniband?
Thanks a lot!!