On Nov 6, 2009, at 7:59 AM, Kritiraj Sajadah wrote: > Hi Everyone, > I have install openmpi 1.3 and blcr 0.81 on my laptop (single > processor). > > I am trying to checkpoint a small test application: > > ########### > > #include <mpi.h> > #include <stdio.h> > #include <stdlib.h> > #include<unistd.h> > #include<signal.h> > > int main(int argc, char **argv) > { > int rank,size; > MPI_Init(&argc, &argv); > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > MPI_Comm_size(MPI_COMM_WORLD, &size); > printf("I am processor no %d of a total of %d procs \n", rank, size); > system("sleep 10"); > printf("I am processor no %d of a total of %d procs \n", rank, size); > system("sleep 10"); > printf("I am processor no %d of a total of %d procs \n", rank, size); > system("sleep 10"); > printf("mpisleep bye \n"); > MPI_Finalize(); > return 0; > } > ################### > > I compile it as follows: > > mpicc mpisleep.c -o mpisleep > > and i run it as follows: > > mpirun -am ft-enable-cr -np 2 mpisleep. > > When i try checkpointing ( ompi-checkpoint -v 8118) it, it checkpoints fine > but when i restart it, i get the following: > > I am processor no 0 of a total of 2 procs > I am processor no 1 of a total of 2 procs > mpisleep bye > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 8118 on node raj-laptop exited on > signal 13 (Broken pipe). > --------------------------------------------------------------------------
Does the behavior change if you remove the 'system()' calls and replace them with 'sleep()'. The 'system()' call is a shorthand for fork/exec. fork/exec has been known to cause problems when called my an MPI process. Give that a try and let me know if it helps. -- Josh > > Any suggestions is very much appreciated > > Raj > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users