Hi Everyone, I wrote a small program with a function to trigger the checkpointing mechanism as follows: ############################################ #include <mpi.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> void trigger_checkpoint(); int main(int argc, char **argv) { int rank,size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("I am processor no %d of a total of %d procs \n", rank, size); system("sleep 10"); trigger_checkpoint(); printf("I am processor no %d of a total of %d procs \n", rank, size); system("sleep 10"); printf("I am processor no %d of a total of %d procs \n", rank, size); system("sleep 10"); printf("bye \n"); MPI_Finalize(); return 0; } void trigger_checkpoint() { printf("hi\n"); system("ompi-checkpoint -v `pidof mpirun` "); } ############################################# The application works fine on my laptop with ubuntu as the OS. However, when I tried running it on one of the machines at my uni, with suse linux installed, the application hangs as soon as the ompi-checkpoint is triggered. This is what I get: ########################################################## I am processor no 0 of a total of 1 procs hi I am processor no 0 of a total of 1 procs [sun06:15426] orte_checkpoint: Checkpointing... [sun06:15426] PID 15411 [sun06:15426] Connected to Mpirun [[12727,0],0] [sun06:15426] orte_checkpoint: notify_hnp: Contact Head Node Process PID 15411
does anyone has some ideas about this? Thank a lot Jean.