[slurm-dev] RE: Selecting a network interface with srun

2017-10-26 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/26/2017 09:58 AM, Sebastian Eastham wrote: > As it happens, you are exactly right, and it turns out this was a > case of "error exists between keyboard and chair". In particular I > wanted to thank John Hearns for pointing out the difference

[slurm-dev] RE: Selecting a network interface with srun

2017-10-26 Thread Sebastian Eastham
[slurm-dev] RE: Selecting a network interface with srun -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I think this is probably the best advice. I've personally never run into a set of circumstances where MPI didn't pick the right interface where there wasn't a major misconfiguration

[slurm-dev] RE: Selecting a network interface with srun

2017-10-25 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I think this is probably the best advice. I've personally never run into a set of circumstances where MPI didn't pick the right interface where there wasn't a major misconfiguration (you'd of course want to test that on your system), but users might w

[slurm-dev] Re: Selecting a network interface with srun

2017-10-25 Thread John Hearns
Ralph, indeed. As I have said before: Finally, my one piece of advice to everyone managing batch systems. It is a name resolution problem. No, really it is. Even if your cluster catches fire, the real reason that your jobs are not being submitted is that the DNS resolver is burning and the sch

[slurm-dev] Re: Selecting a network interface with srun

2017-10-25 Thread r...@open-mpi.org
Good points. I would also caution against renaming nodes using interfaces. This frequently causes failure of 3rd party software packages that compare the return value of “hostname” to the list of allocated nodes for optimization or placement purposes - e.g., mpirun! A quick grep of the mailing l

[slurm-dev] RE: Selecting a network interface with srun

2017-10-25 Thread John Hearns
When using “mpirun” we can specify “-iface ib0”this is true, and the exact syntax depends on your MPI of choice, as noted above. However, don't get confused between IPOIB and Infiniband itself. IPOIB is of course sending IP traffic over Infiniband. An Infiniband network can perfectly happil

[slurm-dev] RE: Selecting a network interface with srun

2017-10-25 Thread Le Biot, Pierre-Marie
Hi Sebastian, Another solution could be to change the configuration of nodes in slurm.conf, making use of NodeName and NodeHostname (and NodeAddr if needed) : “ NodeName Name that Slurm uses to refer to a node[...]. Typically this would be the string that "/bin/hostname -s" returns.[...]It may

[slurm-dev] Re: Selecting a network interface with srun

2017-10-24 Thread Gilles Gouaillardet
fwiw, with Open MPI, ib0 can be selected with export OMPI_MCA_btl_openib_if_include=ib0 assuming slurm was not configured not to export this environment variable Gilles On 10/25/2017 12:55 PM, Paul Hargrove wrote: Re: [slurm-dev] Re: Selecting a network interface with srun The "-ifac

[slurm-dev] Re: Selecting a network interface with srun

2017-10-24 Thread Paul Hargrove
The "-iface ib0" syntax is used by the hydra launcher if MPICH and its derivatives such as MVAPICH. I suggest http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager as a starting point. -Paul On Tue, Oct 24, 2017 at 8:19 PM, r...@open-mpi.org wrote: > “ibface” isn’t an OpenMPI

[slurm-dev] Re: Selecting a network interface with srun

2017-10-24 Thread r...@open-mpi.org
“ibface” isn’t an OpenMPI cmd line option, so I suspect you are using something other than OpenMPI. For OMPI, you could specify the interface via MCA param in the environment or default MCA parameter file. Most MPI implementations have a similar mechanism - you might check your documentation.

[slurm-dev] Re: Selecting a network interface with srun

2017-10-24 Thread Doug Meyer
Hi, I believe that if you are using OpenMPI you can declare the interface. By default it should select the fastest interface on the system. This FAQ may be of help. https://www.open-mpi.org/faq/?category=tcp If I have misunderstood the problem I apologize. best of luck! Doug On Tue, Oct 24,