I just pushed in some new timing code for the CRCP Coord component in r18439.
 https://svn.open-mpi.org/trac/ompi/changeset/1843

This should allow you to see the checkpoint progress through the coordination protocol, and provide some rough timing on the different parts of the algorithm.

To activate it add the MCA parameter "-mca crcp_coord_timing 2" to your mpirun command line.

No algorithmic changes were added, so this will not fix the problem, just give you a perspective of it's activity.

-- Josh


On May 14, 2008, at 1:11 PM, Josh Hursey wrote:

Tamer,

How much communication does your application tend to do? As reported
below if there is a lot of communication between checkpoints then it
may take a while to checkpoint the application since the current
implementation of the coordination algorithm checks every message at
checkpoint time. So what you are seeing might be that the checkpoint
is taking an extremely long time to clear the channel.

I have a few things in the works that attempt to fix this problem.
They are not ready just yet, but I'll make it known when they are. You
can get some diagnostics be setting "-mca crcp_coord_verbose 10" on
the command line, but it is fairly course gained at the moment (I have
some improvements in the pipeline here as well).

Cheers,
Josh

On May 13, 2008, at 3:42 PM, Tamer wrote:

Hi Josh: I am currently using openmpi r18291 and when I run a 12
task job on 3 quad core nodes I am able to checkpoint and restart
several times at the beginning of the run, however, after a few
hours, when I try to checkpoint the code just hangs and it just
won't checkpoint and won't give me an error message. Has this
problem been reported before? All the required executables and
libraries are in my path.

Thanks,
Tamer


On Apr 29, 2008, at 1:37 PM, Sharon Brunett wrote:

Thanks, I'll try the version you recommend below!

Josh Hursey wrote:
Your previous email indicted that you were using r18241. I committed
in r18276 a patch that should fix this problem. Let me know if you
still see it after that update.

Cheers,
Josh

On Apr 29, 2008, at 3:18 PM, Sharon Brunett wrote:

Josh,
I'm also having trouble using ompi-restart on a snapspot made
from a
run
which was previously checkpointed. In other words, restarting a
previously restarted run!

(a) start the run
mpirun -np 16 -am ft-enable-cr ./a.out

<---do an ompi-checkpoint on the mpirun pid from (a) from another
terminal--->>

(b) restart the checkpointed run

ompi-restart ompi_global_snapshot_30086.ckpt

<--do an ompi-checkpoint on mpirun pid from (b) from another
terminal---->>

(c) restart the checkpointed run
ompi-restart ompi_global_snapshot_30120.ckpt

--------------------------------------------------------------------------
mpirun noticed that process rank 12 with PID 30480 on node shc005
exited
on signal 13 (Broken pipe).
--------------------------------------------------------------------------
-bash-2.05b$

I can restart the previous (30086) ckpt but not the latest one made
from
a restarted run.

Any insights would be appreciated.

thanks,
Sharon



Josh Hursey wrote:
Sharon,

This is, unfortunately, to be expected at the moment for this
type of
application. Extremely communication intensive applications will
most
likely cause the implementation of the current coordination
algorithm
to slow down significantly. This is because on a checkpoint Open
MPI
does a peerwise check on the description of (possibly) each
message
to
make sure there are no messages in flight. So for a huge number of
messages this could take a long time.

This is a performance problem with the current implementation of
the
algorithm that we use in Open MPI. I've been meaning to go back
and
improve this, but it has not been critical to do so since
applications
that perform in this manner are outliers in HPC. The coordination
algorithm I'm using is based on the algorithm used by LAM/MPI, but
implemented at a higher level. There are a number of improvements
that
I can explore in the checkpoint/restart framework in Open MPI.

If this is critical for you I might be able to take a look at
it, but
I can't say when. :(

-- Josh

On Apr 29, 2008, at 1:07 PM, Sharon Brunett wrote:

Josh Hursey wrote:
On Apr 29, 2008, at 12:55 AM, Sharon Brunett wrote:

I'm finding that using ompi-checkpoint on an application
which is
very cpu bound takes a very very long time. For example,
trying to
checkpoint a 4 or 8 way Pallas MPI Benchmark application can
take
more than an hour. The problem is not where I'm dumping
checkpoints
(I've tried local and an nfs mount with plenty of space, and
cpu
intensive apps checkpoint quickly).

I'm using BLCR_VERSION=0.6.5 and openmpi-1.3a1r18241.

Is this condition common and if so, are there possibly mca
paramters
which could help?
It depends on how you configured Open MPI with checkpoint/
restart.
There are two modes of operation: No threads, and with a
checkpoint
thread. They are described a bit more in the Checkpoint/Restart
Fault
Tolerance User's Guide on the wiki:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessFT_CR

By default we compile without the checkpoint thread. The
restriction
he is that all processes must be in the MPI library in order to
make
progress on the global checkpoint. For CPU intensive
applications
this
may cause quite a delay in the time to start, and subsequently
finish,
a checkpoint. I'm guessing that this is what you are seeing.

If you configure with the checkpoint thread (add '--enable-mpi-
threads-
--enable-ft-thread' to ./configure) then Open MPI will create a
thread
that runs with each application process. This thread is fairly
light
weight and will make sure that a checkpoint progresses even when
the
process is not in the Open MPI library.

Try enabling the checkpoint thread and see if that helps improve
the
checkpoint time.
Josh,
First...please pardon the blunder in my earlier mail. Comms bound
apps
are the ones taking a while to checkpoint, not cpu bound. In any
case, I
tried configuring with the above two configure options but
still no
luck
on improving checkpointing times or gaining completion on
larger mpi
task runs being checkpointed.

It looks like the checkpointing is just hanging. For example, I
can
checkpoint a 2 way comms bound code (1 task on two nodes) ok.
When I
ask
for a 4 way run on 2 nodes, 30 minutes after the ompi-
checkpoint PID
only see 1 ckpt directory with data in it!


/home/sharon/ompi_global_snapshot_25400.ckpt/0
-bash-2.05b$ ls -l *
opal_snapshot_0.ckpt:
total 0

opal_snapshot_1.ckpt:
total 0

opal_snapshot_2.ckpt:
total 0

opal_snapshot_3.ckpt:
total 1868
-rw-------  1 sharon shc-support 1907476 2008-04-29 10:49
ompi_blcr_context.1850
-rw-r--r--  1 sharon shc-support      33 2008-04-29 10:49
snapshot_meta.data
-bash-2.05b$ pwd


The file system getting the checkpoints is local. I've tried /
scratch
and others as well.

I can checkpoint some codes (like xhpl) just fine across 8 mpi
tasks
( t
nodes), dumping 254M total. Thus, the very long/stuck
checkpointing
seems rather application dependent.

Here's how I configured openmpi

./configure --prefix=/nfs/ds01/support/sharon/ openmpi-1.3a1r18241
--enable-mpi-threads --enable-ft-thread --with-ft=cr --enable-
shared
--enable-mpi-threads=posix --enable-libgcj-multifile
--enable-languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk
--with-mvapi=/usr/mellanox --with-blcr=/opt/blcr



Thanks for any further insights you may have.
Sharon
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to