Dear Group,
I have a mpi-program in which every process is communicating with its 
neighbors. When SELF-checkpointing, every process writes to a separate 
file.Problem is that sometimes after making a checkpoint, program does not 
continue again. Having more number of processes makes this problem severe.With 
just 1 process (no communication), SEFL-checkpointing works normally with no 
problem.I have tried different '--mca btl' parameters (openib,tcp,sm,self), but 
problem persists.I would very much appreciate your support regarding it.
Kind regards,Faisal                                       

Reply via email to