When you receive that callback the MPI has ben put in a quiescent state. As
such it does not allow MPI communication until the checkpoint is completely
finished. So you cannot call barrier in the checkpoint callback. Since Open
MPI did doing a coordinated checkpoint, you can assume that all processes
are calling the same callback at about the same time (the coordination
algorithm synchronizes them for you)

If you would like a notification callback before the quiescence protocol
you might want to look at the INC callbacks:
  http://osl.iu.edu/research/ft/ompi-cr/api.php#api-cr_inc_register_callback
They are available in the Open MPI trunk (v1.7). The
OMPI_CR_INC_PRE_CRS_PRE_MPI
callback will give you immediate notice, and you -should- be able to make
MPI calls in that callback. I have not tried it, but conceptually it should
work. If it does not, I can file a bug ticket and we can look into
addressing it.

-- Josh

On Wed, Feb 15, 2012 at 4:23 AM, Faisal Shahzad <itsfa...@hotmail.com>wrote:

>  Dear Group,
>
> I wanted to do a synchronization check with 'MPI_Barrier(MPI_COMM_WORLD)'
> in 'opal_crs_self_user_checkpoint(char **restart_cmd)' call. Although every
> process is present in this call, it fails to synchronize. Is there any
> reason why cant we use barrier?
> Thanks in advance.
>
> Kind regards,
> Faisal
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

Reply via email to