On Apr 22, 2013, at 3:46 PM, Nathan Hjelm <hje...@lanl.gov> wrote:

> On Mon, Apr 22, 2013 at 03:17:16PM -0700, Mike Clark wrote:
>> Hi,
>> 
>> I am trying to run OpenMPI on the Cray XK7 system at Oak Ridge National Lab 
>> (Titan), and am running in an issue whereby MPI_Init seems to hang 
>> indefinitely, but this issue only arises at large scale, e.g., when running 
>> on 18560 compute nodes (with two MPI processes per node).  The application 
>> runs successfully on 4600 nodes, and we are currently trying to test a 9000 
>> node job to see if this fails or runs.
>> 
>> We are launching our job using something like the following
>> 
>> # mpirun command                                                             
>>                       
>> mpicmd="$OMP_DIR/bin/mpirun --prefix $OMP_DIR -np 37120 --npernode 2 
>> --bind-to core --bind-to numa $app $args"
>> # Print  and Run the Command                                                 
>>                       
>> echo $mpicmd
>> $mpicmd >& $output

Actually, I see something here that bothers me - you specify two conflicting 
binding directives. I would have thought OMPI would abort when seeing that, but 
apparently not (I'll fix it). Which one did you actually want?

You also don't need to specify the --prefix argument because you are giving 
mpirun as an absolute path - we detect that and automatically set the prefix.

>> 
>> Are there any issues that I should be aware of when running OpenMPI on 37120 
>> processes or when running on the Cray Gemini Interconnect?

Do you have evidence to support your statement that MPI_Init is hanging? In 
other words, do you know for certain that the daemons on all those nodes are 
running and wired up correctly, and that it is the application itself that is 
hanging?

One way to test this is to run "hostname" as your app - if that runs 
successfully, then you have reason to believe that the daemons are getting 
started correctly, and that they are successfully passing the app launch cmd to 
all nodes. This is one area that could fail at larger scale as it never gets 
tested.

If that works, then you could use the test code (in the source directory) 
orte/test/system/orte_no_op.c. This simply calls orte_init/finalize and 
verifies that process wireup is working.

If both of those work, then indeed it is something in the MPI layer itself. We 
could then try to find that issue.

BTW: what version of OMPI are you using?


> We have only tested Open MPI up to 131072 ranks on 8192 nodes. Have you tried 
> running DDT on the process to see where it is hung up?
> 
> I have a Titan account so I can help with debugging. I would like to get this 
> issue fixed in 1.7.2.
> 
> -Nathan
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to