Nathan and I discussed at the MPI Forum last week. I argued that your usage is not erroneous, although certain pathological cases (likely concocted) can lead to nasty behavior. He indicated that he would remove the error check, but it may require further discussion/debate with others.
You can remove the error check from the source and recompile if you are in a hurry, or you can use an MPICH-derivative (I have not checked, but I doubt MPICH errors on this code.). Jeff On Mon, Mar 6, 2017 at 8:30 AM, Joseph Schuchart <schuch...@hlrs.de> wrote: > Ping :) I would really appreciate any input on my question below. I > crawled through the standard but cannot seem to find the wording that > prohibits thread-concurrent access and synchronization. > > Using MPI_Rget works in our case but MPI_Rput only guarantees local > completion, not remote completion. Specifically, a thread-parallel > application would have to go into some serial region just to issue an > MPI_Win_flush before a thread can read a value previously written to the > same target. Re-reading remote values in the same process/thread might not > be efficient but is a valid use-case for us. > > Best regards, > Joseph > > > > On 02/20/2017 09:23 AM, Joseph Schuchart wrote: > >> Nathan, >> >> Thanks for your clarification. Just so that I understand where my >> misunderstanding of this matter comes from: can you please point me to the >> place in the standard that prohibits thread-concurrent window >> synchronization using MPI_Win_flush[_all]? I can neither seem to find such >> a passage in 11.5.4 (Flush and Sync), nor in 12.4 (MPI and Threads). The >> latter explicitly excludes waiting on the same request object (which it >> does not) and collective operations on the same communicator (which >> MPI_Win_flush is not) but it fails to mention one-sided non-collective sync >> operations. Any hint would be much appreciated. >> >> We will look at MPI_Rput and MPI_Rget. However, having a single put >> paired with a flush is just the simplest case. We also want to support >> multiple asynchronous operations that are eventually synchronized on a >> per-thread basis where keeping the request handles might not be feasible. >> >> Thanks, >> Joseph >> >> On 02/20/2017 02:30 AM, Nathan Hjelm wrote: >> >>> You can not perform synchronization at the same time as communication on >>> the same target. This means if one thread is in >>> MPI_Put/MPI_Get/MPI_Accumulate (target) you can’t have another thread in >>> MPI_Win_flush (target) or MPI_Win_flush_all(). If your program is doing >>> that it is not a valid MPI program. If you want to ensure a particular put >>> operation is complete try MPI_Rput instead. >>> >>> -Nathan >>> >>> On Feb 19, 2017, at 2:34 PM, Joseph Schuchart <schuch...@hlrs.de> wrote: >>>> >>>> All, >>>> >>>> We are trying to combine MPI_Put and MPI_Win_flush on locked (using >>>> MPI_Win_lock_all) dynamic windows to mimic a blocking put. The application >>>> is (potentially) multi-threaded and we are thus relying on >>>> MPI_THREAD_MULTIPLE support to be available. >>>> >>>> When I try to use this combination (MPI_Put + MPI_Win_flush) in our >>>> application, I am seeing threads occasionally hang in MPI_Win_flush, >>>> probably waiting for some progress to happen. However, when I try to create >>>> a small reproducer (attached, the original application has multiple layers >>>> of abstraction), I am seeing fatal errors in MPI_Win_flush if using more >>>> than one thread: >>>> >>>> ``` >>>> [beryl:18037] *** An error occurred in MPI_Win_flush >>>> [beryl:18037] *** reported by process [4020043777,2] >>>> [beryl:18037] *** on win pt2pt window 3 >>>> [beryl:18037] *** MPI_ERR_RMA_SYNC: error executing rma sync >>>> [beryl:18037] *** MPI_ERRORS_ARE_FATAL (processes in this win will now >>>> abort, >>>> [beryl:18037] *** and potentially your MPI job) >>>> ``` >>>> >>>> I could only trigger this on dynamic windows with multiple concurrent >>>> threads running. >>>> >>>> So: Is this a valid MPI program (except for the missing clean-up at the >>>> end ;))? It seems to run fine with MPICH but maybe they are more tolerant >>>> to some programming errors... >>>> >>>> If it is a valid MPI program, I assume there is some race condition in >>>> MPI_Win_flush that leads to the fatal error (or the hang that I observe >>>> otherwise)? >>>> >>>> I tested this with OpenMPI 1.10.5 on single node Linux Mint 18.1 system >>>> with stock kernel 4.8.0-36 (aka my laptop). OpenMPI and the test were both >>>> compiled using GCC 5.3.0. I could not run it using OpenMPI 2.0.2 due to the >>>> fatal error in MPI_Win_create (which also applies to >>>> MPI_Win_create_dynamic, see my other thread, not sure if they are related). >>>> >>>> Please let me know if this is a valid use case and whether I can >>>> provide you with additional information if required. >>>> >>>> Many thanks in advance! >>>> >>>> Cheers >>>> Joseph >>>> >>>> -- >>>> Dipl.-Inf. Joseph Schuchart >>>> High Performance Computing Center Stuttgart (HLRS) >>>> Nobelstr. 19 >>>> D-70569 Stuttgart >>>> >>>> Tel.: +49(0)711-68565890 >>>> Fax: +49(0)711-6856832 >>>> E-Mail: schuch...@hlrs.de >>>> >>>> <ompi_flush_hang.c>_______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> > -- > Dipl.-Inf. Joseph Schuchart > High Performance Computing Center Stuttgart (HLRS) > Nobelstr. 19 > D-70569 Stuttgart > > Tel.: +49(0)711-68565890 > Fax: +49(0)711-6856832 > E-Mail: schuch...@hlrs.de > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users