Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

Ralph Castain Sat, 18 Oct 2014 11:37:04 -0400 (EDT)

> On Oct 17, 2014, at 3:37 AM, Marshall Ward <marshall.w...@gmail.com> wrote:
> 
> I currently have a numerical model that, for reasons unknown, requires
> preconnection to avoid hanging on an initial MPI_Allreduce call.


That is indeed odd - it might take a while for all the connections to form, but 
it shouldn’t hang

> But
> when we try to scale out beyond around 1000 cores, we are unable to
> get past MPI_Init's preconnection phase.
> 
> To test this, I have a basic C program containing only MPI_Init() and
> MPI_Finalize() named `mpi_init`, which I compile and run using `mpirun
> -mca mpi_preconnect_mpi 1 mpi_init`.

I doubt preconnect has been tested in a rather long time as I’m unaware of 
anyone still using it (we originally provided it for some legacy code that 
otherwise took a long time to initialize). However, I could give it a try and 
see what happens. FWIW: because it was so targeted and hasn’t been used in a 
long time, the preconnect algo is really not very efficient. Still, shouldn’t 
have anything to do with memory footprint.

> 
> This preconnection seems to consume a large amount of memory, and is
> exceeding the available memory on our nodes (~2GiB/core) as the number
> gets into the thousands (~4000 or so). If we try to preconnect to
> around ~6000, we start to see hangs and crashes.
> 
> A failed 5600 core preconnection gave this warning (~10k times) while
> hanging for 30 minutes:
> 
>    [warn] opal_libevent2021_event_base_loop: reentrant invocation.
> Only one event_base_loop can run on each event_base at once.
> 
> A failed 6000-core preconnection job crashed almost immediately with
> the following error.
> 
>    [r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
> file ras_tm_module.c at line 159
>    [r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
> file ras_tm_module.c at line 85
>    [r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in
> file base/ras_base_allocate.c at line 187

This doesn’t have anything to do with preconnect - it indicates that mpirun was 
unable to open the Torque allocation file. However, it shouldn’t have 
“crashed”, but instead simply exited with an error message.

> 
> Should we expect to use very large amounts of memory for
> preconnections of thousands of CPUs? And can these
> 
> I am using Open MPI 1.8.2 on Linux 2.6.32 (centOS) and FDR infiniband
> network. This is probably not enough information, but I'll try to
> provide more if necessary. My knowledge of implementation is
> unfortunately very limited.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25527.php

Re: [OMPI users] large memory usage and hangs when preconnecting beyond 1000 cpus

Reply via email to