I would like to know how to add nodes during a job execution.
Now my hostfile has the node 10.0.0.23 that is off,
I would start this node during the execution so that the job can use it
When I run the command:
mpirun -np 2 -hostfile /tmp/hosts application
the following message appears:
ssh: conn
OMPI has no way of knowing that you will turn the node on at some future point.
All it can do is try to launch the job on the provided node, which fails
because the node doesn't respond.
You'll have to come up with some scheme for telling the node to turn on in
anticipation of starting a job -
On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote:
> OMPI has no way of knowing that you will turn the node on at some future
> point. All it can do is try to launch the job on the provided node, which
> fails because the node doesn't respond.
> You'll have to come up with some scheme for telli
There is a 'self' checkpointer (CRS component) that does application level
checkpointing - exposed at the MPI level. I don't know how different what you
are working on is, but maybe something like that could be harnessed. Note that
I have not tested the 'self' checkpointer with the process migra
On Aug 27, 2011, at 8:28 AM, Rayson Ho wrote:
> On Sat, Aug 27, 2011 at 9:12 AM, Ralph Castain wrote:
>> OMPI has no way of knowing that you will turn the node on at some future
>> point. All it can do is try to launch the job on the provided node, which
>> fails because the node doesn't respond
Let's chat off-list about it - I don't see exactly how this works, but it may
be similar enough.
On Aug 27, 2011, at 8:30 AM, Joshua Hursey wrote:
> There is a 'self' checkpointer (CRS component) that does application level
> checkpointing - exposed at the MPI level. I don't know how differen
Egor,
If updating OFED doesn't solve the problem (and I kinda have the
feeling that it does), you might want to try this mailing list
for IB interoperability questions:
linux-r...@vger.kernel.org
-- YK
On 26-Aug-11 4:42 PM, Shamis, Pavel wrote:
> You may try to update your OFED version. I think