On Apr 23, 2013, at 10:45 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> On Tue, Apr 23, 2013 at 10:17:46AM -0700, Ralph Castain wrote:
>> 
>> On Apr 23, 2013, at 10:09 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
>> 
>>> On Tue, Apr 23, 2013 at 12:21:49PM +0400, ???????????????????? ???????????? 
>>> wrote:
>>>> Hi,
>>>> 
>>>> Nathan,  could you please advise what is expected startup time for OpenMPI
>>>> job at such scale (128K ranks)? I'm interesting in
>>>> 1) time from mpirun start to completion of MPI_Init()
>>> 
>>> It takes less than a minute to run:
>>> 
>>> mpirun -n 131072 /bin/true
>>> 
>>> 
>>>> 2) time from MPI_Init() start to completion of MPI_Init()
>>> 
>>> A simple MPI application took about about 1.25 mins to run. If you want to 
>>> see our setup you can take a look at contrib/platform/lanl/cray_xe6.
>>> 
>>>>> From my experience for 52800 rank job
>>>> 1) took around 20 min
>>>> 2) took around 12 min
>>>> that actually looks like a hung.
>>> 
>>> How many nodes? I have never seen launch times that bad on Cielo. You could 
>>> try adding -mca routed debruijn -novm and see if that helps. It will reduce 
>>> the amount of communication between compute nodes and the login node.
>> 
>> I believe the debrujin module was turned off a while ago due to a bug that 
>> wasn't fixed. However, try using
> 
> Was it turned off or was the priority lowered? If it was lowered then -mca 
> routed debruijn should work. The -novm is to avoid the bug (as I understand 
> it). I am working on fixing the bug now in hope it will be ready for 1.7.2.

Pretty sure it is ompi_ignored and thus, not in the tarball

> 
> -Nathan
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to