On Sun, 2009-11-08 at 20:40 -0800, Martin Siegert wrote:
> Hi,
> 
> I am running into a problem with mpi_allreduce when large buffers
> are used. But does not appear to be unique for mpi_allreduce; it
> occurs with mpi_send/mpi_recv as well; program is attached.
> 1) run this using MPI_Allreduce:

> allreduce completed 2.700941
> enter array size (integer; negative to stop):
> 320000000
> 
> At this point the program just hangs forever.

You could use padb (It's linked to in my sig) to tell you where the
application is stuck - it could just be swapping.

> All programs/libraries are 64bit, interconnect is IB.
> I expect problems with sizes larger than 2^31-1, but these array sizes
> are still much smaller.

Whilst the message counts are smaller than 2^31-1 you should be aware
that the message sizes are larger as they are multiplied by
sizeof(double) so I wouldn't rule out this theory.

Also, you are mallocing at least 4Gb per process and quite possibly a
large amount for buffering in the MPI library as well, it could be that
you are simply running out of memory.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

Reply via email to