[OMPI users] How to add nodes while running job

2011-08-27 Thread Rafael Braga
I would like to know how to add nodes during a job execution. Now my hostfile has the node 10.0.0.23 that is off, I would start this node during the execution so that the job can use it When I run the command: mpirun -np 2 -hostfile /tmp/hosts application the following message appears: ssh: conn

Re: [OMPI users] How to add nodes while running job

2011-08-27 Thread Ralph Castain
OMPI has no way of knowing that you will turn the node on at some future point. All it can do is try to launch the job on the provided node, which fails because the node doesn't respond. You'll have to come up with some scheme for telling the node to turn on in anticipation of starting a job -

Re: [OMPI users] How to add nodes while running job

2011-08-27 Thread Rayson Ho
On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote: > OMPI has no way of knowing that you will turn the node on at some future > point. All it can do is try to launch the job on the provided node, which > fails because the node doesn't respond. > You'll have to come up with some scheme for telli

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Joshua Hursey
There is a 'self' checkpointer (CRS component) that does application level checkpointing - exposed at the MPI level. I don't know how different what you are working on is, but maybe something like that could be harnessed. Note that I have not tested the 'self' checkpointer with the process migra

Re: [OMPI users] How to add nodes while running job

2011-08-27 Thread Ralph Castain
On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote: > On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote: >> OMPI has no way of knowing that you will turn the node on at some future >> point. All it can do is try to launch the job on the provided node, which >> fails because the node doesn't respond

Re: [OMPI users] Related to project ideas in OpenMPI

2011-08-27 Thread Ralph Castain
Let's chat off-list about it - I don't see exactly how this works, but it may be similar enough. On Aug 27, 2011, at 8:30 AM, Joshua Hursey wrote: > There is a 'self' checkpointer (CRS component) that does application level > checkpointing - exposed at the MPI level. I don't know how differen

Re: [OMPI users] ConnectX with InfiniHost IB HCAs

2011-08-27 Thread Yevgeny Kliteynik
Egor, If updating OFED doesn't solve the problem (and I kinda have the feeling that it does), you might want to try this mailing list for IB interoperability questions: linux-r...@vger.kernel.org -- YK On 26-Aug-11 4:42 PM, Shamis, Pavel wrote: > You may try to update your OFED version. I think