On Apr 23, 2013, at 10:09 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> On Tue, Apr 23, 2013 at 12:21:49PM +0400, ???????????????????? ???????????? 
> wrote:
>> Hi,
>> 
>> Nathan,  could you please advise what is expected startup time for OpenMPI
>> job at such scale (128K ranks)? I'm interesting in
>> 1) time from mpirun start to completion of MPI_Init()
> 
> It takes less than a minute to run:
> 
> mpirun -n 131072 /bin/true
> 
> 
>> 2) time from MPI_Init() start to completion of MPI_Init()
> 
> A simple MPI application took about about 1.25 mins to run. If you want to 
> see our setup you can take a look at contrib/platform/lanl/cray_xe6.
> 
>>> From my experience for 52800 rank job
>> 1) took around 20 min
>> 2) took around 12 min
>> that actually looks like a hung.
> 
> How many nodes? I have never seen launch times that bad on Cielo. You could 
> try adding -mca routed debruijn -novm and see if that helps. It will reduce 
> the amount of communication between compute nodes and the login node.

I believe the debrujin module was turned off a while ago due to a bug that 
wasn't fixed. However, try using

"-mca routed radix -mca routed_radix 64"


> 
>> Any advice how to improve startup times of large scale jobs would be very
>> much appreciated.
> 
> The bottleneck for launching at scale is the initial communication between 
> the orteds and mpirun. At this time I don't know what can be done to improve 
> that (I have some ideas but nothing has been implemented yet). At 8192 nodes 
> this takes less than a minute. Everything else should be fairly quick.

I mentioned this to Pasha on the phone call today. We had previously 
collaborated to get a pretty fast startup time on this machine - I believe we 
used static ports to reduce the initial comm scaling. You might want to check 
with him.

> 
> -Nathan Hjelm
> HPC-3, LANL
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to