Re: [OMPI users] which eth interface does mpi use by default when torque supplies it with a hostfile?

2010-05-28 Thread Ralph Castain
On May 28, 2010, at 3:29 PM, Rahul Nabar wrote: > Each of our servers has twin eth cards: 1GigE and 10GigE. How does > openmpi decide which card to use while sending messages on? One of the > cards is on a 10.0. IP address subnet whereas the other cards are on a > 192.168 adress subnet. Can I sel

Re: [OMPI users] [torqueusers] which eth interface does mpi use by default when torque supplies it with a hostfile?

2010-05-28 Thread George Wm Turner
Open MPI is very aggressive about looking for and using any tcp communications device it can find. In your case it will use both the 10.0.. network and the 192.168.. network at the same time. Open MPI does not pay attention to the hosts names for the communications channel. You want to do somet

[OMPI users] which eth interface does mpi use by default when torque supplies it with a hostfile?

2010-05-28 Thread Rahul Nabar
Each of our servers has twin eth cards: 1GigE and 10GigE. How does openmpi decide which card to use while sending messages on? One of the cards is on a 10.0. IP address subnet whereas the other cards are on a 192.168 adress subnet. Can I select one or the other by specifying the --host option with

Re: [OMPI users] MPI daemon error

2010-05-28 Thread Rahul Nabar
On Fri, May 28, 2010 at 3:53 PM, Ralph Castain wrote: > What environment are you running on the cluster, and what version of OMPI? > Not sure that error message is coming from us. openmpi-1.4.1 The cluster runs PBS-Torque. So I guess, that could be the other error source. -- Rahul

Re: [OMPI users] MPI daemon error

2010-05-28 Thread Ralph Castain
What environment are you running on the cluster, and what version of OMPI? Not sure that error message is coming from us. On May 28, 2010, at 1:18 PM, Rahul Nabar wrote: > Often when I try and run larger jobs on our cluster I get the error of > the sort from some of the compute-servers: > >

[OMPI users] MPI daemon error

2010-05-28 Thread Rahul Nabar
Often when I try and run larger jobs on our cluster I get the error of the sort from some of the compute-servers: eu260 - daemon did not report back when launched It does not happen every time; but pretty often. Any ideas what could be wrong? The node seems pingable and I could log in suc