How did you configure Open MPI? Is your application using SIGUSR1?
This error message indicates that Open MPI's daemons could not communicate with the application processes. The daemons send SIGUSR1 to the process to initiate the handshake (you can change this signal with -mca opal_cr_signal). If your application does not respond to the daemon within a time bound (default 20 sec, though you can change it with -mca snapc_full_max_wait_time) then this error is printed, and the checkpoint is aborted.
-- Josh On Sep 22, 2009, at 1:43 AM, Mallikarjuna Shastry wrote:
<error.txt>_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users