Note also that coding the mpi_allreduce as: call mpi_allreduce(MPI_IN_PLACE,phim(0,1,1,1,grp),phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
results in the same freezing behavior in the 60th iteration. (I don't recall why the arrays were being passed, possibly just a mistake.) On Thu, Sep 8, 2011 at 4:17 PM, Greg Fischer <greg.a.fisc...@gmail.com>wrote: > I am seeing mpi_allreduce operations freeze execution of my code on some > moderately-sized problems. The freeze does not manifest itself in every > problem. In addition, it is in a portion of the code that is repeated many > times. In the problem discussed below, the problem appears in the 60th > iteration. > > The current test case that I'm looking at is a 64-processor job. This > particular mpi_allreduce call applies to all 64 processors, with each > communicator in the call containing a total of 4 processors. When I add > print statements before and after the offending line, I see that all 64 > processors successfully make it to the mpi_allreduce call, but only 32 > successfully exit. Stack traces on the other 32 yield something along the > lines of the trace listed at the bottom of this message. The call, itself, > looks like: > > call mpi_allreduce(MPI_IN_PLACE, > phim(0:(phim_size-1),1:im,1:jm,1:kmloc(coords(2)+1),grp), & > > phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr) > > These messages are sized to remain under the 32-bit integer size limitation > for the "count" parameter. The intent is to perform the allreduce operation > on a contiguous block of the array. Previously, I had been passing an > assumed-shape array (i.e. phim(:,:,:,:,grp), but found some documentation > indicating that was potentially dangerous. Making the change from assumed- > to explicit-shaped arrays doesn't solve the problem. However, if I declare > an additional array and use separate send and receive buffers: > > call > mpi_allreduce(phim_local,phim_global,phim_size*im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr) > phim(:,:,:,:,grp) = phim_global > > Then the problem goes away, and every thing works normally. Does anyone > have any insight as to what may be happening here? I'm using "include > 'mpif.h'" rather than the f90 module, does that potentially explain this? > > Thanks, > Greg > > Stack trace(s) for thread: 1 > ----------------- > [0] (1 processes) > ----------------- > main() at ?:? > solver() at solver.f90:31 > solver_q_down() at solver_q_down.f90:52 > iter() at iter.f90:56 > mcalc() at mcalc.f90:38 > pmpi_allreduce__() at ?:? > PMPI_Allreduce() at ?:? > ompi_coll_tuned_allreduce_intra_dec_fixed() at ?:? > ompi_coll_tuned_allreduce_intra_ring_segmented() at ?:? > ompi_coll_tuned_sendrecv_actual() at ?:? > ompi_request_default_wait_all() at ?:? > opal_progress() at ?:? > Stack trace(s) for thread: 2 > ----------------- > [0] (1 processes) > ----------------- > start_thread() at ?:? > btl_openib_async_thread() at ?:? > poll() at ?:? > Stack trace(s) for thread: 3 > ----------------- > [0] (1 processes) > ----------------- > start_thread() at ?:? > service_thread_start() at ?:? > select() at ?:? >