Hello,
I'm using openmpi-1.3a1r18241 on a 2 node configuration and having troubles
with the ompi-restart. I can successfully ompi-checkpoint and ompi-restart a 1
way mpi code.
When I try a 2 way job running across 2 nodes, I get
bash-2.05b$ ompi-restart -verbose ompi_global_snapshot_926.ckpt
[
Josh Hursey wrote:
On Apr 23, 2008, at 4:04 PM, Sharon Brunett wrote:
Hello,
I'm using openmpi-1.3a1r18241 on a 2 node configuration and having
troubles with the ompi-restart. I can successfully ompi-checkpoint
and ompi-restart a 1 way mpi code.
When I try a 2 way job running acr
Josh,
I'm responding to some outstanding questions about the env. I'm trying to
ompi-restart in.
My answers to your questions are sprinkled below, and include a few more
questions based on attempts I've made to get a multi-node restart working.
thanks,
Sharon
Sharon Brune
I'm finding that using ompi-checkpoint on an application which is very cpu bound takes a very very long time. For example, trying to checkpoint a 4 or 8 way Pallas MPI Benchmark application can take more than an hour. The problem is not where I'm dumping checkpoints (I've tried local and an nfs mou
r hostfile
pushed to mpirun properly.
thanks for your help!
Sharon
Josh Hursey wrote:
On Apr 25, 2008, at 6:12 PM, Sharon Brunett wrote:
Josh,
I'm responding to some outstanding questions about the env. I'm
trying to ompi-restart in.
My answers to your questions are sprinkled belo
Josh Hursey wrote:
On Apr 29, 2008, at 12:55 AM, Sharon Brunett wrote:
I'm finding that using ompi-checkpoint on an application which is
very cpu bound takes a very very long time. For example, trying to
checkpoint a 4 or 8 way Pallas MPI Benchmark application can take
more than an
s that
I can explore in the checkpoint/restart framework in Open MPI.
If this is critical for you I might be able to take a look at it, but
I can't say when. :(
-- Josh
On Apr 29, 2008, at 1:07 PM, Sharon Brunett wrote:
Josh Hursey wrote:
On Apr 29, 2008, at 12:55 AM, Sharon Brunett w
-- Josh
On Apr 29, 2008, at 1:07 PM, Sharon Brunett wrote:
Josh Hursey wrote:
On Apr 29, 2008, at 12:55 AM, Sharon Brunett wrote:
I'm finding that using ompi-checkpoint on an application which is
very cpu bound takes a very very long time. For example, trying to
checkpoint a 4 or 8 way Pal
#x27;m using is based on the algorithm used by LAM/MPI, but
implemented at a higher level. There are a number of improvements
that
I can explore in the checkpoint/restart framework in Open MPI.
If this is critical for you I might be able to take a look at it, but
I can't say when. :(
-- Josh
Does Open MPI v 4.1.0 or v 4.1.1 support using IP interfaces that have more
than one IP address?
10 matches
Mail list logo