On Apr 23, 2013, at 10:09 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > On Tue, Apr 23, 2013 at 12:21:49PM +0400, ???????????????????? ???????????? > wrote: >> Hi, >> >> Nathan, could you please advise what is expected startup time for OpenMPI >> job at such scale (128K ranks)? I'm interesting in >> 1) time from mpirun start to completion of MPI_Init() > > It takes less than a minute to run: > > mpirun -n 131072 /bin/true > > >> 2) time from MPI_Init() start to completion of MPI_Init() > > A simple MPI application took about about 1.25 mins to run. If you want to > see our setup you can take a look at contrib/platform/lanl/cray_xe6. > >>> From my experience for 52800 rank job >> 1) took around 20 min >> 2) took around 12 min >> that actually looks like a hung. > > How many nodes? I have never seen launch times that bad on Cielo. You could > try adding -mca routed debruijn -novm and see if that helps. It will reduce > the amount of communication between compute nodes and the login node.
I believe the debrujin module was turned off a while ago due to a bug that wasn't fixed. However, try using "-mca routed radix -mca routed_radix 64" > >> Any advice how to improve startup times of large scale jobs would be very >> much appreciated. > > The bottleneck for launching at scale is the initial communication between > the orteds and mpirun. At this time I don't know what can be done to improve > that (I have some ideas but nothing has been implemented yet). At 8192 nodes > this takes less than a minute. Everything else should be fairly quick. I mentioned this to Pasha on the phone call today. We had previously collaborated to get a pretty fast startup time on this machine - I believe we used static ports to reduce the initial comm scaling. You might want to check with him. > > -Nathan Hjelm > HPC-3, LANL > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users