This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock. Some might call what I'm about to describe erroneous. I wrote the one-sided code in Open MPI and may be among those people.
In both implementations, one-sided communication is not necessarily truly asynchronous. That is, the target of an operation may have to enter the MPI library (MPI_Wtime does not count as entering the library in this case) to progress Lock/Unlock calls. So rank 2 calls lock (which is a no-op in both implementations), calls put, calls unlock, and waits for a response. Ranks 0 and 1 wait for a second and enter lock, get, and unlock. At this point, data actually starts to move. Chances are, rank 0 is going to process it's request first, hence the get from rank 0 returning 0. Then rank 0 will perhaps process some other requests before it leaves unlock (perhaps not), and enter barrier. At this point, it will progress everything until the other ranks enter barrier, meaning rank 2's put and rank 2 and 3s get will finally be processed. In case you're wondering, the specification wasn't disobeyed in the communication order; the lock description is very loose and is relative to other MPI events. So if you put the barrier before the lock/get/unlock, you'd get the answer you wanted because rank 2's lock would have to occur before rank 0's. With no other MPI synchronization, there's no requirement that be true, and the locking order could be 0, 1, 2, 2 if it really wanted to be (ie, it would be perfectly legal for rank 1 to also return 0). This is obviously not ideal, and one of the areas of focus for the MPI-3 standardization effort. In Open MPI, adding true asynchronous behavior is difficult. The original design assumed that the lowest communication layers would be able to provide asynchronous completion events to progress the one-sided implementation. Thus far, only the authors of the TCP stack have provided such behavior and it's not as well tested as other modes of operation. Brian On 4/13/11 12:31 PM, "Abhishek Kulkarni" <abbyzc...@gmail.com> wrote: >Hello, > >I am trying to better understand the semantics of passive synchronization >in one-sided communication calls. Doesn't MPI_Win_unlock() >block to ensure that all the preceeding RMA calls in that epoch have been >synced? > >In that case, why is an undefined value returned when trying to load from >a local window? (see below) > > MPI_Alloc_mem(128, MPI_INFO_NULL, &ptr); > MPI_Win_create(ptr, 128, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); > > // write to the target window of the head node > if (rank == (size - 1)) { > MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); > in = 10; > MPI_Put(&in, 1, MPI_INT, 0, 0, 1, MPI_INT, win); > > MPI_Win_unlock(0, win); > } else { > // busy wait > start = MPI_Wtime(); > end = MPI_Wtime(); > while ((end - start) < 1) > end = MPI_Wtime(); > } > > // read from the head node's window > MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win); > MPI_Get(&out, 1, MPI_INT, 0, 0, 1, MPI_INT, win); > MPI_Win_unlock(0, win); > > MPI_Barrier(MPI_COMM_WORLD); > > printf("R%d: %d\n", rank, out); > >The output of the above program with 1.5.3rc1 (and also with MPICH2 >1.4rc2) is: >R2: 10 >R1: 10 >R0: 0 > >whereas I expect to see: >R2: 10 >R1: 10 >R0: 10 > >Thanks, >Abhishek > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories