Hi, I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3). An strace of one of the processes shows:
Process 10925 attached with 3 threads - interrupt to quit [pid 10927] poll([{fd=17, events=POLLIN}, {fd=16, events=POLLIN}], 2, -1 <unfini shed ...> [pid 10926] select(15, [8 14], [], NULL, NULL <unfinished ...> [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout) [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout) [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout) [pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout) ... The program is a Fortran program using 64bit integers (compiled with -i8) and I correspondingly compiled openmpi (version 1.4.3) with -i8 for the Fortran compiler as well. The program is somewhat difficult to debug since it takes 3 days to reach the point where it hangs. This is what I found so far: MPI_Allreduce is called as call MPI_Allreduce(MPI_IN_PLACE, recvbuf, count, MPI_DOUBLE_PRECISION, & MPI_SUM, MPI_COMM_WORLD, mpierr) with count = 455295488. Since the Fortran interface just calls the C routines in OpenMPI and count variables are 32bit integers in C I started to wonder what is the largest integer "count" for which a MPI_Allreduce succeeds. E.g., in MPICH (it has been a while that I looked into this, i.e., this may or may not be correct anymore) all send/recv were converted into send/recv of MPI_BYTE, thus the largest count for doubles was (2^31-1)/8 = 268435455. Thus, I started to wrap the MPI_Allreduce call with a myMPI_Allreduce routine that repeatedly calls MPI_Allreduce when the count is larger than some value maxallreduce (the myMPI_Allreduce.f90 is attached). I have tested the routine with a trivial program that just fills an array with numbers and calls myMPI_Allreduce and this test succeeds. However, with the real program the situations is very strange: When I set maxallreduce = 268435456, the program hangs at the first call (iallreduce = 1) to MPI_Allreduce in the do loop do iallreduce = 1, nallreduce - 1 idx = (iallreduce - 1)*length + 1 call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, & datatype, op, comm, mpierr) if (mpierr /= MPI_SUCCESS) return end do With maxallreduce = 134217728 the first call succeeds, the second hangs. For maxallreduce = 67108864, the first two calls to MPI_Allreduce complete, but the third (iallreduce = 3) hangs. For maxallreduce = 8388608 the 17th call hangs, for 1048576 the 138th call hangs; here is a table (values from gdb attached to process 0 when the program hangs): maxallreduce iallreduce idx length 268435456 1 1 227647744 134217728 2 113823873 113823872 67108864 3 130084427 65042213 8388608 17 137447697 8590481 1048576 138 143392010 1046657 As if there is (are) some element(s) in the middle of the array with idx >= 143392010 that cannot be sent or recv'd. Has anybody seen this kind of behaviour? Has anybody an idea what could be causing this? Ideas how to get around this? Anything that could help would be appreciated ... I already spent a huge amount of time on this and I am running out of ideas. Cheers, Martin -- Martin Siegert Simon Fraser University Burnaby, British Columbia Canada
subroutine myMPI_Allreduce(sendbuf, recvbuf, cnt, datatype, op, comm, mpierr) implicit none include 'mpif.h' double precision :: sendbuf(*), recvbuf(*) integer, parameter :: k8 = selected_int_kind(18) integer (kind=k8) :: cnt integer :: datatype, op, comm, mpierr integer, parameter :: maxallreduce = 134217728 integer :: nallreduce, iallreduce, length, idx logical :: inplace if (cnt <= maxallreduce) then call MPI_Allreduce(sendbuf, recvbuf, cnt, datatype, op, comm, mpierr) else nallreduce = cnt/maxallreduce if (nallreduce*maxallreduce /= cnt) then nallreduce = nallreduce + 1 length = cnt/nallreduce if (length*nallreduce /= cnt) then length = length + 1 end if else length = maxallreduce end if inplace = sendbuf(1) == MPI_IN_PLACE if (inplace) then do iallreduce = 1, nallreduce - 1 idx = (iallreduce - 1)*length + 1 call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, & datatype, op, comm, mpierr) if (mpierr /= MPI_SUCCESS) return end do else do iallreduce = 1, nallreduce - 1 idx = (iallreduce - 1)*length + 1 call MPI_Allreduce(sendbuf(idx), recvbuf(idx), length, & datatype, op, comm, mpierr) if (mpierr /= MPI_SUCCESS) return end do end if idx = (nallreduce - 1)*length + 1 length = cnt - idx + 1 if (inplace) then call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, & datatype, op, comm, mpierr) else call MPI_Allreduce(sendbuf(idx), recvbuf(idx), length, & datatype, op, comm, mpierr) end if end if return end