I had exactly the same problem.
Trying to run mpi between 2 separate machines, with each machine having
2 ethernet ports, causes really weird behaviour on the most basic code.
I had to disable one of the ethernet ports on each of the machines
and it worked just fine after that. No idea why though !
Open MPI cannot handle having two interfaces on a node on the same subnet. I
believe it has to do with our matching code when we try to match up a
connection.
The result is a hang as you observe. I also believe it is not good practice to
have two interfaces on the same subnet.
If you put them
+1
It is definitely bad Linux practice to have 2 ports on the same subnet.
If you still want that configuration, however (e.g., you have some conditions
in your environment that make it workable), you can make Open MPI only use one
or more of those interfaces via the btl_tcp_if_include (or btl_
Did you have both of the ethernet ports on the same subnet, or were they on
different subnets?
On Feb 17, 2012, at 5:36 AM, Richard Bardwell wrote:
> I had exactly the same problem.
> Trying to run mpi between 2 separate machines, with each machine having
> 2 ethernet ports, causes really weird
Yes, they were on the same subnet. I guess that is the problem.
- Original Message -
From: "Jeff Squyres"
To: "Open MPI Users"
Sent: Friday, February 17, 2012 4:20 PM
Subject: Re: [OMPI users] Problem running an mpi application on nodes with
more than one interface
Did you have b
Yes. I did.
Because it was a same NIC with two ports each capable of delivering 5gb/s,
I never thought that they should be in different subnet.
But once I changed the subnet for one of the ports on both the nodes, it
seemed to work..
Also, I am looking for a good way to start understanding the im
Dave,
Thanks for the suggestion, adding "-mca plm ^rshd" did force mpirun to
spawn things via qrsh rather than SSH. My problem is solved!
--
Brian McNally
On 02/16/2012 03:05 AM, Dave Love wrote:
Brian McNally writes:
Hi Dave,
I looked through the INSTALL, VERSION, NEWS, and README files
Hi guys
Our apologies - the rshd launcher isn't supposed to be in a release branch.
We've removed it for the next release.
Sorry for the problem... :-(
On Fri, Feb 17, 2012 at 11:42 AM, Brian McNally wrote:
> Dave,
>
> Thanks for the suggestion, adding "-mca plm ^rshd" did force mpirun to
> s
On Feb 17, 2012, at 11:59 AM, Jingcha Joba wrote:
> Also, I am looking for a good way to start understanding the implementation
> level details for OpenMPI. Can you point me to some good source?
> (PS: To start with, I have already read the FAQ section)
Unfortunately, there isn't a lot of good