Ken, Could you try to launch the job with aprun instead of mpirun?
Thanks, Josh On Thu, Jun 11, 2015 at 12:21 PM, Howard Pritchard <hpprit...@gmail.com> wrote: > Hello Ken, > > Could you give the details of the allocation request (qsub args) > as well as the mpirun command line args? I'm trying to reproduce > on the nersc system. > > It would be interesting if you have access to a similar size non-cray > cluster if you get the same problems. > > Howard > > > 2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org>: > >> I don’t have a Cray, but let me see if I can reproduce this on something >> else >> >> > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) < >> kenneth.w.leiter2....@mail.mil> wrote: >> > >> > Hello, >> > >> > I am attempting to use the openmpi development master for a code that >> uses >> > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 at the >> > Army Research Laboratory. After reading through the mailing list I came >> to >> > the conclusion that the master branch is the only hope for getting this >> to >> > work on the newer Cray machines. >> > >> > To test I am using the cpi-master.c cpi-worker.c example. The test works >> > when executing on a small number of processors, five or less, but >> begins to >> > fail with segmentation faults in orted when using more processors. Even >> with >> > five or fewer processors, I am spreading the computation to more than >> one >> > node. I am using the cray ugni btl through the alps scheduler. >> > >> > I get a core file from orted and have the seg fault tracked down to >> > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have tried >> > reading the code to understand how this happens, but am unsure. I do see >> > that in the if statement where I take the else branch, the other branch >> > specifically checks "if (NULL == req->proxy)" - however, no such check >> is >> > done the the else branch. >> > >> > I have debug output dumped for the failing runs. I can provide the >> output >> > along with ompi_info output and config.log to anyone who is interested. >> > >> > - Ken Leiter >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/06/27094.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/06/27095.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27098.php >