Re: [OMPI users] MPI daemon error

2010-05-29 Thread Ralph Castain
On May 29, 2010, at 11:35 AM, Rahul Nabar wrote: > On Sat, May 29, 2010 at 8:19 AM, Ralph Castain wrote: > >> >>> From your other note, it sounds like #3 might be the problem here. Do you >>> have some nodes that are configured with "eth0" pointing to your 10.x >>> network, and other nodes w

Re: [OMPI users] MPI daemon error

2010-05-29 Thread Rahul Nabar
On Sat, May 29, 2010 at 8:19 AM, Ralph Castain wrote: > > >From your other note, it sounds like #3 might be the problem here. Do you > >have some nodes that are configured with "eth0" pointing to your 10.x > >network, and other nodes with "eth0" pointing to your 192.x network? I have > >found

Re: [OMPI users] MPI daemon error

2010-05-29 Thread Ralph Castain
There are some timeout issues you can see with large clusters on Torque - check the Torque web site for explanations and instructions on what to do about it. However, that doesn't appear to be the problem here. If our daemon doesn't report back, it is typically due to one or more of the followi

Re: [OMPI users] MPI daemon error

2010-05-28 Thread Rahul Nabar
On Fri, May 28, 2010 at 3:53 PM, Ralph Castain wrote: > What environment are you running on the cluster, and what version of OMPI? > Not sure that error message is coming from us. openmpi-1.4.1 The cluster runs PBS-Torque. So I guess, that could be the other error source. -- Rahul

Re: [OMPI users] MPI daemon error

2010-05-28 Thread Ralph Castain
What environment are you running on the cluster, and what version of OMPI? Not sure that error message is coming from us. On May 28, 2010, at 1:18 PM, Rahul Nabar wrote: > Often when I try and run larger jobs on our cluster I get the error of > the sort from some of the compute-servers: > >