Hi all, Thank you all for the quick responses! My apologies for my own slow response - turns out my spam filter got a bit too aggressive and I only just found out about all these..
As it happens, you are exactly right, and it turns out this was a case of "error exists between keyboard and chair". In particular I wanted to thank John Hearns for pointing out the difference between pure Infiniband and IPOIB. Once I saw that, I realized I should be looking at total transfer rates in iftop rather than at the transfer broken down by device. I could then see that there was a big spike in traffic on ib0 exactly when communication started occurring, which I take to mean that communication is happening over IB, but that iftop is unable to disaggregate the traffic. With regards to the other suggestions: - Re: John and Ryan: Indeed, we actually do tag our IB nodes in slurm, so I couldn't agree more! Until now I was concerned that the IB was not being used even when we chose IB nodes, but it turns out my fears were unfounded. - Re: Pierre-Marie: I had not thought to modify the NodeName/NodeHostName/NodeAddr variables. That seems like it could be quite a flexible solution, although I have to confess to being a little nervous about it. - Re: several users who suggested the use of OpenMPI: We happen to be using MPICH (for non-IB nodes) and MVAPICH2 (for IB nodes), in both cases compiled for slurm and without the Hydra launcher (except when we were running non-slurm tests, in which case we used dedicated Hydra-based configurations of both). We discounted OpenMPI about a year ago due to poor performance compared to MVAPICH2, but that was in the early days of our code development and may have been due to other factors. I will take another look at OpenMPI and see how things pan out now that we have a slightly better idea of what we are doing. Thanks again for all your help! Regards, Seb -----Original Message----- From: Ryan Novosielski [mailto:novos...@rutgers.edu] Sent: Wednesday, October 25, 2017 12:41 PM To: slurm-dev <slurm-dev@schedmd.com> Subject: [slurm-dev] RE: Selecting a network interface with srun -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think this is probably the best advice. I've personally never run into a set of circumstances where MPI didn't pick the right interface where there wasn't a major misconfiguration (you'd of course want to test that on your system), but users might want more predictable behavior, or to be able to request Infiniband specifically. On 10/25/2017 11:22 AM, John Hearns wrote: > What I would do is tag the Infiniband equipped nodes with a feature > called 'IB' or 'nonIB' for the others, and choose those nodes. (Sorry > - my head is in PBSPro world these days so that would be a > resources_available in that world) > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iFwEARECABwFAlnwvkoVHG5vdm9zaXJqQHJ1dGdlcnMuZWR1AAoJEJm/oGnRHLG+ QNUAn1EYfpqOtn1cyV/6qutVGP4kBt8pAKCR0IzgdnEF9sHGVWIB7jXcczFiiw== =m3/O -----END PGP SIGNATURE-----