Hi there,
I have attached a little piece of code which summarizes a "bug?" that
annoys me ultimately. Issuing various calls to MPI_WIN_LOCK/UNLOCK seems
to hang some processes until a MPI_BARRIER is encountered!??
My experience with MPI is very modest, so I apologize in advance if I
misread the MPI-2 specs, but it looks that what I want to do is correct.
If you look to the file hangs.F90; the code starts with various call to
LOCK/UNLOCK and then goes on with, let's say, a big piece of work, in
between the comment " start action" and "action done". For the purpose
of this example, that's a do loop of 10s.
I don't want to put a barrier after the various calls to LOCK/UNLOCK
because I want it to run asynchronously. Also notice that I don't need
some mutex or so, all that calls can be done simultaneously and in any
order. My only pb is the following hangs:
Here the output when the code run on a SMP machine (8 cores) by
increasing the number of processus (the same occurs with distributed
memory).
mpirun -np 1 ./hangs
start action for rank= 0
(10 secondes later)
action done for rank= 0
<----works as I expect.
mpirun -np 2 ./hangs
start action for rank= 1
start action for rank= 0
(10 secs later)
action done for rank= 1
action done for rank= 0
<----so far so good; but with more processus the "bug?" appears:
mpirun -np 3 ./hangs
start action for rank= 1
start action for rank= 0
(10 secs later)
action done for rank= 0
action done for rank= 1
start action for rank= 2
(10 secs later)
action done for rank= 2
The processus 2 remained stuck on the MPI_UNLOCK statement until 0 and 1
reached the MPI_BARRIER instruction; which actually renders the
execution serial :)
I tested with up to 8 processes and the problem becomes even worse; a
random number of processes are stuck on the MPI_UNLOCK. However, this
does not occur at each execution. Sometime, rarely though, all the
processes get released as expected from the UNLOCK.
Additionally, if a MPI_BARRIER is issued just after the MPI_UNLOCK,
there is no problem any more; but I never read in the MPI-2 specs that
it should be the case, and this would completely kills the interest of
performing asynchronous operations.
gcc/gfortran is 4.6.3
(Open MPI) 1.4.5
Please let me know if this behaviour can be fixed and if you need
additional information!
Thanks in advance,
Cheers,
Chris.
program hangs
implicit none
include "mpif.h"
integer :: myrank, nsize, code
integer :: targrank
integer :: timein,timenow
integer :: WinOnQ
integer :: IntSize,DispUnit
integer(MPI_ADDRESS_KIND) :: WinSize
integer(MPI_ADDRESS_KIND), parameter :: ZeroDisplace = 0
integer, parameter :: NullAssert = 0
integer, parameter :: CountOne = 1
integer, parameter :: QLockFlag = -1
integer :: GetQRdma
integer, volatile :: QrdmaAddress
call MPI_INIT(code)
call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,code)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nsize,code)
QrdmaAddress = 10
!open win
call MPI_TYPE_SIZE(MPI_INTEGER,IntSize,code)
WinSize = IntSize !one element
DispUnit = IntSize
call MPI_WIN_CREATE(QrdmaAddress,WinSize,DispUnit,MPI_INFO_NULL &
,MPI_COMM_WORLD,WinOnQ,code)
!rdma lock/get/put/unlock operations
do targrank = 0,nsize-1
if (myrank.eq.targrank) cycle
! write(*,*)'lock/get/put || orig= targ= ',myrank,targrank
call MPI_WIN_LOCK(MPI_LOCK_EXCLUSIVE,targrank,NullAssert,WinOnQ,code)
call MPI_GET(GetQrdma,CountOne,MPI_INTEGER,targrank,ZeroDisplace &
,CountOne,MPI_INTEGER,WinOnQ,code)
call MPI_PUT(QLockFlag,CountOne,MPI_INTEGER,targrank,ZeroDisplace &
,CountOne,MPI_INTEGER,WinOnQ,code)
call MPI_WIN_UNLOCK(targrank,WinOnQ,code)
! write(*,*)'unlocked || orig= targ= ',myrank,targrank
! write(*,*)'orig= targ= GetQrdma= ',myrank,targrank,GetQrdma
enddo
!this fixes the issue, but I don't want sync here obsviously
! call MPI_BARRIER(MPI_COMM_WORLD,code)
write(*,*)
write(*,*)
write(*,*)'start action for rank= ',myrank
timein = time()
do while ((timenow-timein).lt.10)
timenow = time()
end do
write(*,*) 'action done for rank= ',myrank
call MPI_WIN_FREE(WinOnQ,code)
call MPI_BARRIER(MPI_COMM_WORLD,code)
call MPI_FINALIZE(code)
end program hangs
# >>> DESIGNED FOR GMAKE <<<
FC=mpif90
FFLAGS= -O -fopenmp
hangs : hangs.o
$(FC) $(FFLAGS) hangs.o -o $@
%.o: %.F90
$(FC) $(FFLAGS) $(INCLUDE) -c $<
clean:
rm *.$(ext) *.o *.mod