Hi,

I am debugging a program that hangs in MPI_Allreduce (openmpi-1.4.3).
An strace of one of the processes shows:

Process 10925 attached with 3 threads - interrupt to quit
[pid 10927] poll([{fd=17, events=POLLIN}, {fd=16, events=POLLIN}], 2, -1 <unfini
shed ...>
[pid 10926] select(15, [8 14], [], NULL, NULL <unfinished ...>
[pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
[pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
[pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
[pid 10925] poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=PO
LLIN}, {fd=7, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
...

The program is a Fortran program using 64bit integers (compiled with -i8)
and I correspondingly compiled openmpi (version 1.4.3) with -i8 for
the Fortran compiler as well.

The program is somewhat difficult to debug since it takes 3 days to reach
the point where it hangs. This is what I found so far:

MPI_Allreduce is called as

call MPI_Allreduce(MPI_IN_PLACE, recvbuf, count, MPI_DOUBLE_PRECISION, &
                   MPI_SUM, MPI_COMM_WORLD, mpierr)

with count = 455295488. Since the Fortran interface just calls the
C routines in OpenMPI and count variables are 32bit integers in C I started
to wonder what is the largest integer "count" for which a MPI_Allreduce
succeeds. E.g., in MPICH (it has been a while that I looked into this, i.e.,
this may or may not be correct anymore) all send/recv were converted
into send/recv of MPI_BYTE, thus the largest count for doubles was
(2^31-1)/8 = 268435455. Thus, I started to wrap the MPI_Allreduce call
with a myMPI_Allreduce routine that repeatedly calls MPI_Allreduce when
the count is larger than some value maxallreduce (the myMPI_Allreduce.f90
is attached). I have tested the routine with a trivial program that
just fills an array with numbers and calls myMPI_Allreduce and this
test succeeds.
However, with the real program the situations is very strange:
When I set maxallreduce = 268435456, the program hangs at the first call
(iallreduce = 1) to MPI_Allreduce in the do loop

         do iallreduce = 1, nallreduce - 1
            idx = (iallreduce - 1)*length + 1
            call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, &
                               datatype, op, comm, mpierr)
            if (mpierr /= MPI_SUCCESS) return
         end do

With maxallreduce = 134217728 the first call succeeds, the second hangs. 
For maxallreduce = 67108864, the first two calls to MPI_Allreduce complete, 
but the third (iallreduce = 3) hangs. For maxallreduce = 8388608 the
17th call hangs, for 1048576 the 138th call hangs; here is a table 
(values from gdb attached to process 0 when the program hangs):

maxallreduce iallreduce         idx        length
268435456             1           1     227647744
134217728             2   113823873     113823872
 67108864             3   130084427      65042213
  8388608            17   137447697       8590481
  1048576           138   143392010       1046657

As if there is (are) some element(s) in the middle of the array with 
idx >= 143392010 that cannot be sent or recv'd.

Has anybody seen this kind of behaviour?
Has anybody an idea what could be causing this?
Ideas how to get around this?
Anything that could help would be appreciated ... I already spent a
huge amount of time on this and I am running out of ideas.

Cheers,
Martin

-- 
Martin Siegert
Simon Fraser University
Burnaby, British Columbia
Canada
subroutine myMPI_Allreduce(sendbuf, recvbuf, cnt, datatype, op, comm, mpierr)
implicit none
include 'mpif.h'
double precision :: sendbuf(*), recvbuf(*)
integer, parameter :: k8 = selected_int_kind(18)
integer (kind=k8) :: cnt
integer :: datatype, op, comm, mpierr
integer, parameter :: maxallreduce = 134217728
integer :: nallreduce, iallreduce, length, idx
logical :: inplace

   if (cnt <= maxallreduce) then
      call MPI_Allreduce(sendbuf, recvbuf, cnt, datatype, op, comm, mpierr)
   else
      nallreduce = cnt/maxallreduce
      if (nallreduce*maxallreduce /= cnt) then
         nallreduce = nallreduce + 1
         length = cnt/nallreduce
         if (length*nallreduce /= cnt) then
            length = length + 1
         end if
      else
         length = maxallreduce
      end if
      inplace = sendbuf(1) == MPI_IN_PLACE
      if (inplace) then
         do iallreduce = 1, nallreduce - 1
            idx = (iallreduce - 1)*length + 1
            call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, &
                               datatype, op, comm, mpierr)
            if (mpierr /= MPI_SUCCESS) return
         end do
      else
         do iallreduce = 1, nallreduce - 1
            idx = (iallreduce - 1)*length + 1
            call MPI_Allreduce(sendbuf(idx), recvbuf(idx), length, &
                               datatype, op, comm, mpierr)
            if (mpierr /= MPI_SUCCESS) return
         end do
      end if
      idx = (nallreduce - 1)*length + 1
      length = cnt - idx + 1
      if (inplace) then
         call MPI_Allreduce(MPI_IN_PLACE, recvbuf(idx), length, &
                            datatype, op, comm, mpierr)
      else
         call MPI_Allreduce(sendbuf(idx), recvbuf(idx), length, &
                            datatype, op, comm, mpierr)
      end if
   end if
   return
end

Reply via email to