This is fine if each thread interacts with a different window, no? Jeff On Sun, Feb 19, 2017 at 5:32 PM Nathan Hjelm <hje...@me.com> wrote:
> You can not perform synchronization at the same time as communication on > the same target. This means if one thread is in > MPI_Put/MPI_Get/MPI_Accumulate (target) you can’t have another thread in > MPI_Win_flush (target) or MPI_Win_flush_all(). If your program is doing > that it is not a valid MPI program. If you want to ensure a particular put > operation is complete try MPI_Rput instead. > > -Nathan > > > On Feb 19, 2017, at 2:34 PM, Joseph Schuchart <schuch...@hlrs.de> wrote: > > > > All, > > > > We are trying to combine MPI_Put and MPI_Win_flush on locked (using > MPI_Win_lock_all) dynamic windows to mimic a blocking put. The application > is (potentially) multi-threaded and we are thus relying on > MPI_THREAD_MULTIPLE support to be available. > > > > When I try to use this combination (MPI_Put + MPI_Win_flush) in our > application, I am seeing threads occasionally hang in MPI_Win_flush, > probably waiting for some progress to happen. However, when I try to create > a small reproducer (attached, the original application has multiple layers > of abstraction), I am seeing fatal errors in MPI_Win_flush if using more > than one thread: > > > > ``` > > [beryl:18037] *** An error occurred in MPI_Win_flush > > [beryl:18037] *** reported by process [4020043777,2] > > [beryl:18037] *** on win pt2pt window 3 > > [beryl:18037] *** MPI_ERR_RMA_SYNC: error executing rma sync > > [beryl:18037] *** MPI_ERRORS_ARE_FATAL (processes in this win will now > abort, > > [beryl:18037] *** and potentially your MPI job) > > ``` > > > > I could only trigger this on dynamic windows with multiple concurrent > threads running. > > > > So: Is this a valid MPI program (except for the missing clean-up at the > end ;))? It seems to run fine with MPICH but maybe they are more tolerant > to some programming errors... > > > > If it is a valid MPI program, I assume there is some race condition in > MPI_Win_flush that leads to the fatal error (or the hang that I observe > otherwise)? > > > > I tested this with OpenMPI 1.10.5 on single node Linux Mint 18.1 system > with stock kernel 4.8.0-36 (aka my laptop). OpenMPI and the test were both > compiled using GCC 5.3.0. I could not run it using OpenMPI 2.0.2 due to the > fatal error in MPI_Win_create (which also applies to > MPI_Win_create_dynamic, see my other thread, not sure if they are related). > > > > Please let me know if this is a valid use case and whether I can provide > you with additional information if required. > > > > Many thanks in advance! > > > > Cheers > > Joseph > > > > -- > > Dipl.-Inf. Joseph Schuchart > > High Performance Computing Center Stuttgart (HLRS) > > Nobelstr. 19 > > D-70569 Stuttgart > > > > Tel.: +49(0)711-68565890 > > Fax: +49(0)711-6856832 > > E-Mail: schuch...@hlrs.de > > > > <ompi_flush_hang.c>_______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users