On Mon, Apr 21, 2014 at 08:53:02AM +0200, Tobias Burnus wrote: > Dear all, > > I would like to do one-sided communication as implementation of a > Fortran coarray library. "MPI provides three synchronization > mechanisms: > > "1. The MPI_WIN_FENCE collective synchronization call supports a > simple synchronization pattern that is often used in parallel > computations: namely a loosely-synchronous model, where global > computation phases alternate with global communication phases. [...] > "2. The four functions MPI_WIN_START, MPI_WIN_COMPLETE, > MPI_WIN_POST, and MPI_WIN_WAIT [...] > "3. Finally, shared lock access is provided by the functions > MPI_WIN_LOCK, MPI_WIN_LOCK_ALL, MPI_WIN_UNLOCK, and > MPI_WIN_UNLOCK_ALL." (MPIv3, p.438) > > I would like to use mechanism 1, but leaving the Win_lock/Win_unlock > away does not work. How is one supposed to use the first mechanism? > (I haven't tried to specify the "no_locks" for "info", but that > shouldn't matter, should it?)
From what I understand you can not mix active (PSCW, Fence) with passive (lock/unlock) access epochs. It is ok to start an active epoch after leaving a passive epoch or vise versa. Remember that MPI_Win_fence can either start or end an active access epoch. The sematics of the synchronization mechanisms are well documented in the standard. See MPI-3 § 11.5. > Follow up question: Is it semantically correct to have concurrent > write access to adjacent array elements with method 1? I mean > something like using an array of single-byte elements > (integer(kind=1)) where process 2 sets (MPI_Win_put) the elements > with odd array indexes and process 3 the ones with even indexes of > an array located in process 1? By itself, one writes to different > memory locations, but the hardware cannot typically not update > single bytes atomically in the memory but only chunks of (e.g.) 4 > bytes. The problem is that this access is semantically permitted by > Fortran while at the same time, I do not want to do unnecessary > locking. In practical terms, accessing the same window/array for > MPI_Win_put simultaneously will occur for halo exchange - but it is > likely to not access directly adjacent memory. Yeah. I would be very careful with concurrent write access. Right now with osc/rdma you can get away with different process writing to adjacent bytes but optimized versions (coming in time for 1.9) will probably not give you correct results. > Secondly, I probably missed something but is it possible to access > arbitrary remote memory without requiring to call something > collectively (such as MPI_Win_create)? The usage case is a derived > type (C equivalent: struct) which contains a pointer. The derived > type/struct (being a coarray in my case) has an associated MPI_Win - > and I can hence obtain the address to which the pointer component > points to. That address I would then like to use for MPI_Put/MPI_Get > - without support of the remove side and, in particular, without > calling a collective on all all processes. Any idea how to do this? This is possible if the window was creates with MPI_Win_create_dynamic. Be aware that you will want to keep the number of regions attached small since each attached region will use a potentially limited resource. See MPI-3 § 11.2.4. -Nathan Hjelm HPC-5, LANL
pgp6sLkA2r9CE.pgp
Description: PGP signature