Hi all,

Thank you all for the quick responses! My apologies for my own slow response - 
turns out my spam filter got a bit too aggressive and I only just found out 
about all these..

As it happens, you are exactly right, and it turns out this was a case of 
"error exists between keyboard and chair". In particular I wanted to thank John 
Hearns for pointing out the difference between pure Infiniband and IPOIB. Once 
I saw that, I realized I should be looking at total transfer rates in iftop 
rather than at the transfer broken down by device. I could then see that there 
was a big spike in traffic on ib0 exactly when communication started occurring, 
which I take to mean that communication is happening over IB, but that iftop is 
unable to disaggregate the traffic. With regards to the other suggestions:

 - Re: John and Ryan: Indeed, we actually do tag our IB nodes in slurm, so I 
couldn't agree more! Until now I was concerned that the IB was not being used 
even when we chose IB nodes, but it turns out my fears were unfounded.
 - Re: Pierre-Marie: I had not thought to modify the 
NodeName/NodeHostName/NodeAddr variables. That seems like it could be quite a 
flexible solution, although I have to confess to being a little nervous about 
it.
 - Re: several users who suggested the use of OpenMPI: We happen to be using 
MPICH (for non-IB nodes) and MVAPICH2 (for IB nodes), in both cases compiled 
for slurm and without the Hydra launcher (except when we were running non-slurm 
tests, in which case we used dedicated Hydra-based configurations of both). We 
discounted OpenMPI about a year ago due to poor performance compared to 
MVAPICH2, but that was in the early days of our code development and may have 
been due to other factors. I will take another look at OpenMPI and see how 
things pan out now that we have a slightly better idea of what we are doing.

Thanks again for all your help!

Regards,

Seb

-----Original Message-----
From: Ryan Novosielski [mailto:novos...@rutgers.edu] 
Sent: Wednesday, October 25, 2017 12:41 PM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] RE: Selecting a network interface with srun


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I think this is probably the best advice. I've personally never run into a set 
of circumstances where MPI didn't pick the right interface where there wasn't a 
major misconfiguration (you'd of course want to test that on your system), but 
users might want more predictable behavior, or to be able to request Infiniband 
specifically.

On 10/25/2017 11:22 AM, John Hearns wrote:
> What I would do is tag the Infiniband equipped nodes with a feature 
> called 'IB' or 'nonIB' for the others, and choose those nodes. (Sorry 
> - my head is in PBSPro world these days so that would be a 
> resources_available in that world)
> 

- --
 ____
 || \\UTGERS,     |----------------------*O*------------------------
 ||_// the State  |    Ryan Novosielski - novos...@rutgers.edu
 || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
 ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
      `'
-----BEGIN PGP SIGNATURE-----

iFwEARECABwFAlnwvkoVHG5vdm9zaXJqQHJ1dGdlcnMuZWR1AAoJEJm/oGnRHLG+
QNUAn1EYfpqOtn1cyV/6qutVGP4kBt8pAKCR0IzgdnEF9sHGVWIB7jXcczFiiw==
=m3/O
-----END PGP SIGNATURE-----

Reply via email to