Note also that coding the mpi_allreduce as:

   call
mpi_allreduce(MPI_IN_PLACE,phim(0,1,1,1,grp),phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)

results in the same freezing behavior in the 60th iteration.  (I don't
recall why the arrays were being passed, possibly just a mistake.)


On Thu, Sep 8, 2011 at 4:17 PM, Greg Fischer <greg.a.fisc...@gmail.com>wrote:

> I am seeing mpi_allreduce operations freeze execution of my code on some
> moderately-sized problems.  The freeze does not manifest itself in every
> problem.  In addition, it is in a portion of the code that is repeated many
> times.  In the problem discussed below, the problem appears in the 60th
> iteration.
>
> The current test case that I'm looking at is a 64-processor job.  This
> particular mpi_allreduce call applies to all 64 processors, with each
> communicator in the call containing a total of 4 processors.  When I add
> print statements before and after the offending line, I see that all 64
> processors successfully make it to the mpi_allreduce call, but only 32
> successfully exit.  Stack traces on the other 32 yield something along the
> lines of the trace listed at the bottom of this message.  The call, itself,
> looks like:
>
>  call mpi_allreduce(MPI_IN_PLACE,
> phim(0:(phim_size-1),1:im,1:jm,1:kmloc(coords(2)+1),grp), &
>
> phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
>
> These messages are sized to remain under the 32-bit integer size limitation
> for the "count" parameter.  The intent is to perform the allreduce operation
> on a contiguous block of the array.  Previously, I had been passing an
> assumed-shape array (i.e. phim(:,:,:,:,grp), but found some documentation
> indicating that was potentially dangerous.  Making the change from assumed-
> to explicit-shaped arrays doesn't solve the problem.   However, if I declare
> an additional array and use separate send and receive buffers:
>
>  call
> mpi_allreduce(phim_local,phim_global,phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
>  phim(:,:,:,:,grp) = phim_global
>
> Then the problem goes away, and every thing works normally.  Does anyone
> have any insight as to what may be happening here?  I'm using "include
> 'mpif.h'" rather than the f90 module, does that potentially explain this?
>
> Thanks,
> Greg
>
> Stack trace(s) for thread: 1
> -----------------
> [0] (1 processes)
> -----------------
> main() at ?:?
>   solver() at solver.f90:31
>     solver_q_down() at solver_q_down.f90:52
>       iter() at iter.f90:56
>         mcalc() at mcalc.f90:38
>           pmpi_allreduce__() at ?:?
>             PMPI_Allreduce() at ?:?
>               ompi_coll_tuned_allreduce_intra_dec_fixed() at ?:?
>                 ompi_coll_tuned_allreduce_intra_ring_segmented() at ?:?
>                   ompi_coll_tuned_sendrecv_actual() at ?:?
>                     ompi_request_default_wait_all() at ?:?
>                       opal_progress() at ?:?
> Stack trace(s) for thread: 2
> -----------------
> [0] (1 processes)
> -----------------
> start_thread() at ?:?
>   btl_openib_async_thread() at ?:?
>     poll() at ?:?
> Stack trace(s) for thread: 3
> -----------------
> [0] (1 processes)
> -----------------
> start_thread() at ?:?
>   service_thread_start() at ?:?
>     select() at ?:?
>

Reply via email to