On Wed, Mar 31, 2010 at 7:39 PM, Addepalli, Srirangam V <[email protected]> wrote: > Hello All. > I am trying to checkpoint a mpi application that has been started using the > follwong mpirun command > > mpirun -am ft-enable-cr -np 8 pw.x < Ge46.pw.in > Ge46.ph.out > > ompi-checkpoint 31396 ( Works) How ever when i try to terminate the process > > ompi-checkpoint --term 31396 it never finishes. How do i bebug this issue.
ompi-checkpoint is exactly ompi-checkpoint + sending SIGTERM to your app. If ompi-checkpoint finishes, then your app is not dealing with SIGTERM correctly. Make sure you're not ignoring SIGTERM, you need to either handle it or let it kill your app. If it's a multithreaded app, make sure you can "distribute" the SIGTERM to ALL the threads, i.e., when you receive SIGTERM, notify all other threads that they should join or quit. Regards,
