[OMPI users] Remote progress in MPI_Win_flush_local

Joseph Schuchart Fri, 23 Jun 2017 07:21:36 -0700

All,

We employ the following pattern to send signals between processes:


```
int com_rank, root = 0;
// allocate MPI window
MPI_Win win = allocate_win();
// do some computation
...
// Process 0 waits for a signal
if (com_rank == root) {
  do {
    MPI_Fetch_and_op(NULL, &res,
      MPI_INT, com_rank, 0, MPI_NO_OP, win);
    MPI_Win_flush_local(com_rank, win);
  } while (res == 0);
} else {
  MPI_Accumulate(
    &val, &res, 1, MPI_INT, root, 0, MPI_SUM, win);
  MPI_Win_flush(root, win);
}
[...]
```

We use MPI_Fetch_and_op to atomically query the local memory locationfor a signal and MPI_Accumulate to send the signal (I have omitted thereset and other details for simplicity).

If running on a single node (my laptop), this code snippet reproduciblyhangs, with the root process indefinitely repeating the do-while-loopand all other processes being stuck in MPI_Win_flush.

An interesting observation here is that if I replace theMPI_Win_flush_local with MPI_Win_flush the application does not hang.However, my understanding is that a local flush should be sufficient forMPI_Fetch_and_op with MPI_NO_OP as remote completion is not required.

I do not observe this hang with MPICH 3.2 and I am aware that theprogress semantics of MPI are rather vague. However, I'm curious whetherthis difference is intended and whether or not repeatedly calling intoMPI communication functions (that do not block) should provide progressfor incoming RMA operations?


Any input is much appreciated.

Cheers
Joseph
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Remote progress in MPI_Win_flush_local

Reply via email to