On Mar 21, 2010, at 12:58 PM, Addepalli, Srirangam V wrote:

Yes We have seen this behavior too.

Another behavior I have seen is that one MPI process starts to show different elapsed time than its peers. Is it because checkpoint happened on behalf of this process?

R
________________________________________
From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of ananda.mu...@wipro.com [ananda.mu...@wipro.com]
Sent: Saturday, March 20, 2010 10:18 PM
To: us...@open-mpi.org
Subject: [OMPI users] top command output shows huge CPU utilization when openmpi processes resume after the checkpoint

When I checkpoint my openmpi application using ompi_checkpoint, I see that top command suddenly shows some really huge numbers in "CPU %" field such as 150% 200% etc. After sometime, these numbers do come back to the normal numbers under 100%. This happens exactly around the time checkpoint is completed and when the processes are resuming the execution.

One cause for this type of CPU utilization is due to the C/R thread. During non-checkpoint/normal processing the thread is polling for a checkpoint fairly aggressively. You can change how aggressive the thread is by adjusting the two parameters below:
 http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_check
 http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_wait

I usually set the latter to:
 opal_cr_thread_sleep_wait=1000

You can also turn off the C/R thread, either by configure'ing without it, or disabling it at runtime by setting the 'opal_cr_use_thread' parameter to '0':
 http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_use_thread


The CPU increase during the checkpoint may be due to both the Open MPI C/R thread, and the BLCR thread becoming active on the machine. You might try to determine whether this is BLCR's CPU utilization or Open MPI's by creating a single process application and watching the CPU utilization when checkpointing with BLCR. You may also want to look at the memory consumption of the process to make sure that there is enough for BLCR to run efficiently.

This may also be due to processes finished with the checkpoint waiting on other peer processes to finish. I don't think we have a good way to control how aggressively these waiting processes poll for completion of peers. If this becomes a problem we can look into adding a parameter similar to opal_cr_thread_sleep_wait to throttle the polling on the machine.

The disadvantage of making the various polling for completion loops less aggressive, is that the checkpoint may stall the checkpoint and/ or application for a little longer than necessary. But if this is acceptable to the user, then they can adjust the MCA parameters as necessary.


Another behavior I have seen is that one MPI process starts to show different elapsed time than its peers. Is it because checkpoint happened on behalf of this process?

Can you explain a bit more about what you mean by this? Neither Open MPI nor BLCR messes with the timer on the machine, so we are not changing it in any way. The process is 'stopped' briefly while BLCR takes the checkpoint, so this will extend the running time of the process. How much the running time is extended (a.k.a. checkpoint overhead) is determined by a bunch of things, but primarily by the storage device(s) that the checkpoint is being written to.


For your reference, I am using open mpi 1.3.4 and BLCR 0.8.2 for checkpointing.

It would be interesting to know if you see the same behavior with the trunk or v1.5 series of Open MPI.

Hope that helps,
Josh


Thanks
Anand

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to