Nathan,

Thanks for your clarification. Just so that I understand where my misunderstanding of this matter comes from: can you please point me to the place in the standard that prohibits thread-concurrent window synchronization using MPI_Win_flush[_all]? I can neither seem to find such a passage in 11.5.4 (Flush and Sync), nor in 12.4 (MPI and Threads). The latter explicitly excludes waiting on the same request object (which it does not) and collective operations on the same communicator (which MPI_Win_flush is not) but it fails to mention one-sided non-collective sync operations. Any hint would be much appreciated.

We will look at MPI_Rput and MPI_Rget. However, having a single put paired with a flush is just the simplest case. We also want to support multiple asynchronous operations that are eventually synchronized on a per-thread basis where keeping the request handles might not be feasible.

Thanks,
Joseph

On 02/20/2017 02:30 AM, Nathan Hjelm wrote:
You can not perform synchronization at the same time as communication on the 
same target. This means if one thread is in MPI_Put/MPI_Get/MPI_Accumulate 
(target) you can’t have another thread in MPI_Win_flush (target) or 
MPI_Win_flush_all(). If your program is doing that it is not a valid MPI 
program. If you want to ensure a particular put operation is complete try 
MPI_Rput instead.

-Nathan

On Feb 19, 2017, at 2:34 PM, Joseph Schuchart <schuch...@hlrs.de> wrote:

All,

We are trying to combine MPI_Put and MPI_Win_flush on locked (using 
MPI_Win_lock_all) dynamic windows to mimic a blocking put. The application is 
(potentially) multi-threaded and we are thus relying on MPI_THREAD_MULTIPLE 
support to be available.

When I try to use this combination (MPI_Put + MPI_Win_flush) in our 
application, I am seeing threads occasionally hang in MPI_Win_flush, probably 
waiting for some progress to happen. However, when I try to create a small 
reproducer (attached, the original application has multiple layers of 
abstraction), I am seeing fatal errors in MPI_Win_flush if using more than one 
thread:

```
[beryl:18037] *** An error occurred in MPI_Win_flush
[beryl:18037] *** reported by process [4020043777,2]
[beryl:18037] *** on win pt2pt window 3
[beryl:18037] *** MPI_ERR_RMA_SYNC: error executing rma sync
[beryl:18037] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[beryl:18037] ***    and potentially your MPI job)
```

I could only trigger this on dynamic windows with multiple concurrent threads 
running.

So: Is this a valid MPI program (except for the missing clean-up at the end 
;))? It seems to run fine with MPICH but maybe they are more tolerant to some 
programming errors...

If it is a valid MPI program, I assume there is some race condition in 
MPI_Win_flush that leads to the fatal error (or the hang that I observe 
otherwise)?

I tested this with OpenMPI 1.10.5 on single node Linux Mint 18.1 system with 
stock kernel 4.8.0-36 (aka my laptop). OpenMPI and the test were both compiled 
using GCC 5.3.0. I could not run it using OpenMPI 2.0.2 due to the fatal error 
in MPI_Win_create (which also applies to MPI_Win_create_dynamic, see my other 
thread, not sure if they are related).

Please let me know if this is a valid use case and whether I can provide you 
with additional information if required.

Many thanks in advance!

Cheers
Joseph

--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

<ompi_flush_hang.c>_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to