That seems like a bug to me. What version of Open MPI are you using? How have you setup the C/R functionality (what MCA options do you have set, what command line options are you using)? Can you send a small reproducing application that we can test against?
That should help us focus in on the problem a bit. -- Josh On Wed, Aug 31, 2011 at 6:36 AM, Faisal Shahzad <itsfa...@hotmail.com> wrote: > Dear Group, > I have a mpi-program in which every process is communicating with its > neighbors. When SELF-checkpointing, every process writes to a separate file. > Problem is that sometimes after making a checkpoint, program does not > continue again. Having more number of processes makes this problem severe. > With just 1 process (no communication), SEFL-checkpointing works normally > with no problem. > I have tried different '--mca btl' parameters (openib,tcp,sm,self), but > problem persists. > I would very much appreciate your support regarding it. > Kind regards, > Faisal > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey