Ping :) I would really appreciate any input on my question below. I
crawled through the standard but cannot seem to find the wording that
prohibits thread-concurrent access and synchronization.
Using MPI_Rget works in our case but MPI_Rput only guarantees local
completion, not remote completion. Specifically, a thread-parallel
application would have to go into some serial region just to issue an
MPI_Win_flush before a thread can read a value previously written to the
same target. Re-reading remote values in the same process/thread might
not be efficient but is a valid use-case for us.
Best regards,
Joseph
On 02/20/2017 09:23 AM, Joseph Schuchart wrote:
Nathan,
Thanks for your clarification. Just so that I understand where my
misunderstanding of this matter comes from: can you please point me to
the place in the standard that prohibits thread-concurrent window
synchronization using MPI_Win_flush[_all]? I can neither seem to find
such a passage in 11.5.4 (Flush and Sync), nor in 12.4 (MPI and
Threads). The latter explicitly excludes waiting on the same request
object (which it does not) and collective operations on the same
communicator (which MPI_Win_flush is not) but it fails to mention
one-sided non-collective sync operations. Any hint would be much
appreciated.
We will look at MPI_Rput and MPI_Rget. However, having a single put
paired with a flush is just the simplest case. We also want to support
multiple asynchronous operations that are eventually synchronized on a
per-thread basis where keeping the request handles might not be feasible.
Thanks,
Joseph
On 02/20/2017 02:30 AM, Nathan Hjelm wrote:
You can not perform synchronization at the same time as communication
on the same target. This means if one thread is in
MPI_Put/MPI_Get/MPI_Accumulate (target) you can’t have another thread
in MPI_Win_flush (target) or MPI_Win_flush_all(). If your program is
doing that it is not a valid MPI program. If you want to ensure a
particular put operation is complete try MPI_Rput instead.
-Nathan
On Feb 19, 2017, at 2:34 PM, Joseph Schuchart <schuch...@hlrs.de>
wrote:
All,
We are trying to combine MPI_Put and MPI_Win_flush on locked (using
MPI_Win_lock_all) dynamic windows to mimic a blocking put. The
application is (potentially) multi-threaded and we are thus relying
on MPI_THREAD_MULTIPLE support to be available.
When I try to use this combination (MPI_Put + MPI_Win_flush) in our
application, I am seeing threads occasionally hang in MPI_Win_flush,
probably waiting for some progress to happen. However, when I try to
create a small reproducer (attached, the original application has
multiple layers of abstraction), I am seeing fatal errors in
MPI_Win_flush if using more than one thread:
```
[beryl:18037] *** An error occurred in MPI_Win_flush
[beryl:18037] *** reported by process [4020043777,2]
[beryl:18037] *** on win pt2pt window 3
[beryl:18037] *** MPI_ERR_RMA_SYNC: error executing rma sync
[beryl:18037] *** MPI_ERRORS_ARE_FATAL (processes in this win will
now abort,
[beryl:18037] *** and potentially your MPI job)
```
I could only trigger this on dynamic windows with multiple
concurrent threads running.
So: Is this a valid MPI program (except for the missing clean-up at
the end ;))? It seems to run fine with MPICH but maybe they are more
tolerant to some programming errors...
If it is a valid MPI program, I assume there is some race condition
in MPI_Win_flush that leads to the fatal error (or the hang that I
observe otherwise)?
I tested this with OpenMPI 1.10.5 on single node Linux Mint 18.1
system with stock kernel 4.8.0-36 (aka my laptop). OpenMPI and the
test were both compiled using GCC 5.3.0. I could not run it using
OpenMPI 2.0.2 due to the fatal error in MPI_Win_create (which also
applies to MPI_Win_create_dynamic, see my other thread, not sure if
they are related).
Please let me know if this is a valid use case and whether I can
provide you with additional information if required.
Many thanks in advance!
Cheers
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de
<ompi_flush_hang.c>_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users