Hi everybody, in our group, we are currently working with a 2D CFD application that is based on the simple von Neumann neighborhood. The 2D data grid is partitioned into horizontal stripes such that each process calculates such a stripe. After each iteration, a process exchanges the upper and lower boundary with the neighbor processes.
The application is optimized to calculate the boundary first, exchange them with the neighbors and then compute the inner parts of the block. We use one-sided communication to transfer the boundary data with. In pseudo code: for each time step: compute boundary (A) Use MPI_Win_post, MPI_Win_Start, MPI_Put to transfer/receive boundary to neighbor process (B) compute inner parts (C) call MPI_Win_Complete and MPI_Win_wait fo finish access/exposure epoch We found out that the default way of MPICH2's CH3 channel implementation is to enqueue RMA operations until the unlocking synchronization call (wait/complete in our case). So theres no opportunity for an overlap of communication (A) and (B). Now my beginners question is: How can we achieve (if possible) an overlap of communication (A) of computation (B) with OpenMPI? Do we need to tune any btl or osc parameters of OpenMPI? Or is this overlap possible by design/implementation, so we really don't have to care? We use OpenMPI 1.4.3, Mellanox MHGH19-XTC ConnectXZ cards and a Mellanox MTS3600R-1UNC switch. IPoIB is not activated. The output of "ompi_info --param all all" is attached. Thanks for replies! Steffen
openmpi_info.tar.gz
Description: GNU Zip compressed data