Hi, I'm just following up on this to say that the problem was not
related to preconnection, but just very large memory usage for high
CPU jobs.
Preconnecting was just acting to send off a large number of
isend/irecv messages and trigger the memory consumption.
I tried experimenting a bit with XRC
At those sizes it is possible you are running into resource
exhastion issues. Some of the resource exhaustion code paths still lead
to hangs. If the code does not need to be fully connected I would
suggest not using mpi_preconnect_mpi but instead track down why the
initial MPI_Allreduce hangs. I w
Thanks, it's at least good to know that the behaviour isn't normal!
Could it be some sort of memory leak in the call? The code in
ompi/runtime/ompi_mpi_preconnect.c
looks reasonably safe, though maybe doing thousands of of isend/irecv
pairs is causing problems with the buffer used in ptp mes
> On Oct 17, 2014, at 3:37 AM, Marshall Ward wrote:
>
> I currently have a numerical model that, for reasons unknown, requires
> preconnection to avoid hanging on an initial MPI_Allreduce call.
That is indeed odd - it might take a while for all the connections to form, but
it shouldn’t hang
>
I currently have a numerical model that, for reasons unknown, requires
preconnection to avoid hanging on an initial MPI_Allreduce call. But
when we try to scale out beyond around 1000 cores, we are unable to
get past MPI_Init's preconnection phase.
To test this, I have a basic C program containing