> On Oct 17, 2014, at 3:37 AM, Marshall Ward <marshall.w...@gmail.com> wrote: > > I currently have a numerical model that, for reasons unknown, requires > preconnection to avoid hanging on an initial MPI_Allreduce call.
That is indeed odd - it might take a while for all the connections to form, but it shouldn’t hang > But > when we try to scale out beyond around 1000 cores, we are unable to > get past MPI_Init's preconnection phase. > > To test this, I have a basic C program containing only MPI_Init() and > MPI_Finalize() named `mpi_init`, which I compile and run using `mpirun > -mca mpi_preconnect_mpi 1 mpi_init`. I doubt preconnect has been tested in a rather long time as I’m unaware of anyone still using it (we originally provided it for some legacy code that otherwise took a long time to initialize). However, I could give it a try and see what happens. FWIW: because it was so targeted and hasn’t been used in a long time, the preconnect algo is really not very efficient. Still, shouldn’t have anything to do with memory footprint. > > This preconnection seems to consume a large amount of memory, and is > exceeding the available memory on our nodes (~2GiB/core) as the number > gets into the thousands (~4000 or so). If we try to preconnect to > around ~6000, we start to see hangs and crashes. > > A failed 5600 core preconnection gave this warning (~10k times) while > hanging for 30 minutes: > > [warn] opal_libevent2021_event_base_loop: reentrant invocation. > Only one event_base_loop can run on each event_base at once. > > A failed 6000-core preconnection job crashed almost immediately with > the following error. > > [r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in > file ras_tm_module.c at line 159 > [r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in > file ras_tm_module.c at line 85 > [r104:18459] [[32743,0],0] ORTE_ERROR_LOG: File open failure in > file base/ras_base_allocate.c at line 187 This doesn’t have anything to do with preconnect - it indicates that mpirun was unable to open the Torque allocation file. However, it shouldn’t have “crashed”, but instead simply exited with an error message. > > Should we expect to use very large amounts of memory for > preconnections of thousands of CPUs? And can these > > I am using Open MPI 1.8.2 on Linux 2.6.32 (centOS) and FDR infiniband > network. This is probably not enough information, but I'll try to > provide more if necessary. My knowledge of implementation is > unfortunately very limited. > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25527.php