Hello Ken, Could you give the details of the allocation request (qsub args) as well as the mpirun command line args? I'm trying to reproduce on the nersc system.
It would be interesting if you have access to a similar size non-cray cluster if you get the same problems. Howard 2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org>: > I don’t have a Cray, but let me see if I can reproduce this on something > else > > > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) < > kenneth.w.leiter2....@mail.mil> wrote: > > > > Hello, > > > > I am attempting to use the openmpi development master for a code that > uses > > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at the > > Army Research Laboratory. After reading through the mailing list I came > to > > the conclusion that the master branch is the only hope for getting this > to > > work on the newer Cray machines. > > > > To test I am using the cpi-master.c cpi-worker.c example. The test works > > when executing on a small number of processors, five or less, but begins > to > > fail with segmentation faults in orted when using more processors. Even > with > > five or fewer processors, I am spreading the computation to more than one > > node. I am using the cray ugni btl through the alps scheduler. > > > > I get a core file from orted and have the seg fault tracked down to > > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have tried > > reading the code to understand how this happens, but am unsure. I do see > > that in the if statement where I take the else branch, the other branch > > specifically checks "if (NULL == req->proxy)" - however, no such check is > > done the the else branch. > > > > I have debug output dumped for the failing runs. I can provide the output > > along with ompi_info output and config.log to anyone who is interested. > > > > - Ken Leiter > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27094.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27095.php