You can not perform synchronization at the same time as communication on the 
same target. This means if one thread is in MPI_Put/MPI_Get/MPI_Accumulate 
(target) you can’t have another thread in MPI_Win_flush (target) or 
MPI_Win_flush_all(). If your program is doing that it is not a valid MPI 
program. If you want to ensure a particular put operation is complete try 
MPI_Rput instead.

-Nathan

> On Feb 19, 2017, at 2:34 PM, Joseph Schuchart <schuch...@hlrs.de> wrote:
> 
> All,
> 
> We are trying to combine MPI_Put and MPI_Win_flush on locked (using 
> MPI_Win_lock_all) dynamic windows to mimic a blocking put. The application is 
> (potentially) multi-threaded and we are thus relying on MPI_THREAD_MULTIPLE 
> support to be available.
> 
> When I try to use this combination (MPI_Put + MPI_Win_flush) in our 
> application, I am seeing threads occasionally hang in MPI_Win_flush, probably 
> waiting for some progress to happen. However, when I try to create a small 
> reproducer (attached, the original application has multiple layers of 
> abstraction), I am seeing fatal errors in MPI_Win_flush if using more than 
> one thread:
> 
> ```
> [beryl:18037] *** An error occurred in MPI_Win_flush
> [beryl:18037] *** reported by process [4020043777,2]
> [beryl:18037] *** on win pt2pt window 3
> [beryl:18037] *** MPI_ERR_RMA_SYNC: error executing rma sync
> [beryl:18037] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
> [beryl:18037] ***    and potentially your MPI job)
> ```
> 
> I could only trigger this on dynamic windows with multiple concurrent threads 
> running.
> 
> So: Is this a valid MPI program (except for the missing clean-up at the end 
> ;))? It seems to run fine with MPICH but maybe they are more tolerant to some 
> programming errors...
> 
> If it is a valid MPI program, I assume there is some race condition in 
> MPI_Win_flush that leads to the fatal error (or the hang that I observe 
> otherwise)?
> 
> I tested this with OpenMPI 1.10.5 on single node Linux Mint 18.1 system with 
> stock kernel 4.8.0-36 (aka my laptop). OpenMPI and the test were both 
> compiled using GCC 5.3.0. I could not run it using OpenMPI 2.0.2 due to the 
> fatal error in MPI_Win_create (which also applies to MPI_Win_create_dynamic, 
> see my other thread, not sure if they are related).
> 
> Please let me know if this is a valid use case and whether I can provide you 
> with additional information if required.
> 
> Many thanks in advance!
> 
> Cheers
> Joseph
> 
> -- 
> Dipl.-Inf. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
> 
> Tel.: +49(0)711-68565890
> Fax: +49(0)711-6856832
> E-Mail: schuch...@hlrs.de
> 
> <ompi_flush_hang.c>_______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to