This is fine if each thread interacts with a different window, no?

Jeff
On Sun, Feb 19, 2017 at 5:32 PM Nathan Hjelm <hje...@me.com> wrote:

> You can not perform synchronization at the same time as communication on
> the same target. This means if one thread is in
> MPI_Put/MPI_Get/MPI_Accumulate (target) you can’t have another thread in
> MPI_Win_flush (target) or MPI_Win_flush_all(). If your program is doing
> that it is not a valid MPI program. If you want to ensure a particular put
> operation is complete try MPI_Rput instead.
>
> -Nathan
>
> > On Feb 19, 2017, at 2:34 PM, Joseph Schuchart <schuch...@hlrs.de> wrote:
> >
> > All,
> >
> > We are trying to combine MPI_Put and MPI_Win_flush on locked (using
> MPI_Win_lock_all) dynamic windows to mimic a blocking put. The application
> is (potentially) multi-threaded and we are thus relying on
> MPI_THREAD_MULTIPLE support to be available.
> >
> > When I try to use this combination (MPI_Put + MPI_Win_flush) in our
> application, I am seeing threads occasionally hang in MPI_Win_flush,
> probably waiting for some progress to happen. However, when I try to create
> a small reproducer (attached, the original application has multiple layers
> of abstraction), I am seeing fatal errors in MPI_Win_flush if using more
> than one thread:
> >
> > ```
> > [beryl:18037] *** An error occurred in MPI_Win_flush
> > [beryl:18037] *** reported by process [4020043777,2]
> > [beryl:18037] *** on win pt2pt window 3
> > [beryl:18037] *** MPI_ERR_RMA_SYNC: error executing rma sync
> > [beryl:18037] *** MPI_ERRORS_ARE_FATAL (processes in this win will now
> abort,
> > [beryl:18037] ***    and potentially your MPI job)
> > ```
> >
> > I could only trigger this on dynamic windows with multiple concurrent
> threads running.
> >
> > So: Is this a valid MPI program (except for the missing clean-up at the
> end ;))? It seems to run fine with MPICH but maybe they are more tolerant
> to some programming errors...
> >
> > If it is a valid MPI program, I assume there is some race condition in
> MPI_Win_flush that leads to the fatal error (or the hang that I observe
> otherwise)?
> >
> > I tested this with OpenMPI 1.10.5 on single node Linux Mint 18.1 system
> with stock kernel 4.8.0-36 (aka my laptop). OpenMPI and the test were both
> compiled using GCC 5.3.0. I could not run it using OpenMPI 2.0.2 due to the
> fatal error in MPI_Win_create (which also applies to
> MPI_Win_create_dynamic, see my other thread, not sure if they are related).
> >
> > Please let me know if this is a valid use case and whether I can provide
> you with additional information if required.
> >
> > Many thanks in advance!
> >
> > Cheers
> > Joseph
> >
> > --
> > Dipl.-Inf. Joseph Schuchart
> > High Performance Computing Center Stuttgart (HLRS)
> > Nobelstr. 19
> > D-70569 Stuttgart
> >
> > Tel.: +49(0)711-68565890
> > Fax: +49(0)711-6856832
> > E-Mail: schuch...@hlrs.de
> >
> > <ompi_flush_hang.c>_______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to