Greg Fischer wrote:
(I apologize in advance for the simplistic/newbie
question.)
I'm performing an ALLREDUCE operation on a multi-dimensional array.
This operation is the biggest bottleneck in the code, and I'm wondering
if there's a way to do it more efficiently than what I'm doing now.
Here's a representative example of what's happening:
ir=1
do ikl=1,km
do ij=1,jm
do ii=1,im
albuf(ir)=array(ii,ij,ikl,nl,0,ng)
ir=ir+1
enddo
enddo
enddo
agbuf=0.0
call
mpi_allreduce(albuf,agbuf,im*jm*kmloc(coords(2)+1),mpi_real,mpi_sum,ang_com,ierr)
ir=1
do ikl=1,km
do
ij=1,jm
do ii=1,im
phim(ii,ij,ikl,nl,0,ng)=agbuf(ir)
ir=ir+1
enddo
enddo
enddo
Is there any way to just do this in one fell swoop, rather than
buffering, transmitting, and unbuffering? This operation is looped
over many times. Are there savings to be had here?
There are three steps here: buffering, transmitting, and unbuffering.
Any idea how the run time is distributed among those three steps?
E.g., if most time is spent in the MPI call, then combining all three
steps into one is unlikely to buy you much... and might even hurt. If
most of the time is spent in the MPI call, then there may be some
tuning of collective algorithms to do. I don't have any experience
doing this with OMPI. I'm just saying it makes some sense to isolate
the problem a little bit more.
|