Hi,

Just to follow up on this.  We have managed to get OpenMPI to run at large
scale, to do so we had to use aprun instead of using openmpi's mpirun
command.

While this has allowed us to now run at the full scale of Titan, we have
found a huge drop in MPI_Alltoall performance when running at 18K
processors.  E.g., performance per node has decreased by a factor 200X
versus running at 4.6K nodes.  Is there any obvious explanation for this
that I could have overlooked such as a buffer size or option that needs to
be set (configure option or environment variable) when running at such
large scale?  We are running inter-communicator one-way sending if this
makes any difference.

Yours optimistically,

Mike.


On 4/22/13 3:17 PM, "Mike Clark" <mcl...@nvidia.com> wrote:

>Hi,
>
>I am trying to run OpenMPI on the Cray XK7 system at Oak Ridge National
>Lab (Titan), and am running in an issue whereby MPI_Init seems to hang
>indefinitely, but this issue only arises at large scale, e.g., when
>running on 18560 compute nodes (with two MPI processes per node).  The
>application runs successfully on 4600 nodes, and we are currently trying
>to test a 9000 node job to see if this fails or runs.
>
>We are launching our job using something like the following
>
># mpirun command  
>                  
>mpicmd="$OMP_DIR/bin/mpirun --prefix $OMP_DIR -np 37120 --npernode 2
>--bind-to core --bind-to numa $app $args"
># Print  and Run the Command
>                  
>echo $mpicmd
>$mpicmd >& $output
>
>Are there any issues that I should be aware of when running OpenMPI on
>37120 processes or when running on the Cray Gemini Interconnect?
>
>We are using OpenMPI 1.7.1 (1.7.x is required for Cray Gemini support)
>and gcc 4.7.2.
>
>Thanks,
>
>Mike.

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to