On Mar 21, 2010, at 12:58 PM, Addepalli, Srirangam V wrote:
Yes We have seen this behavior too.
Another behavior I have seen is that one MPI process starts to
show different elapsed time than its peers. Is it because
checkpoint happened on behalf of this process?
R
________________________________________
From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On
Behalf Of ananda.mu...@wipro.com [ananda.mu...@wipro.com]
Sent: Saturday, March 20, 2010 10:18 PM
To: us...@open-mpi.org
Subject: [OMPI users] top command output shows huge CPU utilization
when openmpi processes resume after the checkpoint
When I checkpoint my openmpi application using ompi_checkpoint, I
see that top command suddenly shows some really huge numbers in "CPU
%" field such as 150% 200% etc. After sometime, these numbers do
come back to the normal numbers under 100%. This happens exactly
around the time checkpoint is completed and when the processes are
resuming the execution.
One cause for this type of CPU utilization is due to the C/R thread.
During non-checkpoint/normal processing the thread is polling for a
checkpoint fairly aggressively. You can change how aggressive the
thread is by adjusting the two parameters below:
http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_check
http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_wait
I usually set the latter to:
opal_cr_thread_sleep_wait=1000
You can also turn off the C/R thread, either by configure'ing without
it, or disabling it at runtime by setting the 'opal_cr_use_thread'
parameter to '0':
http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_use_thread
The CPU increase during the checkpoint may be due to both the Open MPI
C/R thread, and the BLCR thread becoming active on the machine. You
might try to determine whether this is BLCR's CPU utilization or Open
MPI's by creating a single process application and watching the CPU
utilization when checkpointing with BLCR. You may also want to look at
the memory consumption of the process to make sure that there is
enough for BLCR to run efficiently.
This may also be due to processes finished with the checkpoint waiting
on other peer processes to finish. I don't think we have a good way to
control how aggressively these waiting processes poll for completion
of peers. If this becomes a problem we can look into adding a
parameter similar to opal_cr_thread_sleep_wait to throttle the polling
on the machine.
The disadvantage of making the various polling for completion loops
less aggressive, is that the checkpoint may stall the checkpoint and/
or application for a little longer than necessary. But if this is
acceptable to the user, then they can adjust the MCA parameters as
necessary.
Another behavior I have seen is that one MPI process starts to show
different elapsed time than its peers. Is it because checkpoint
happened on behalf of this process?
Can you explain a bit more about what you mean by this? Neither Open
MPI nor BLCR messes with the timer on the machine, so we are not
changing it in any way. The process is 'stopped' briefly while BLCR
takes the checkpoint, so this will extend the running time of the
process. How much the running time is extended (a.k.a. checkpoint
overhead) is determined by a bunch of things, but primarily by the
storage device(s) that the checkpoint is being written to.
For your reference, I am using open mpi 1.3.4 and BLCR 0.8.2 for
checkpointing.
It would be interesting to know if you see the same behavior with the
trunk or v1.5 series of Open MPI.
Hope that helps,
Josh
Thanks
Anand
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of
the addressee(s) and may contain proprietary, confidential or
privileged information. If you are not the intended recipient, you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately and destroy all copies of this message
and any attachments.
WARNING: Computer viruses can be transmitted via email. The
recipient should check this email and any attachments for the
presence of viruses. The company accepts no liability for any damage
caused by any virus transmitted by this email.
www.wipro.com
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users