Howard: could you add —display-devel-map —display-allocation and send the output along? I’d like to see why it things you are oversubscribed.
Thanks > On Jun 11, 2015, at 11:36 AM, Howard Pritchard <hpprit...@gmail.com> wrote: > > Hi Ken, > > Could you post the output of your ompi_info? > > I have PrgEnv-gnu/5.2.56 and gcc/4.9.2 loaded in my env on nersc system. > Following configure line: > > ./configure --enable-mpi-java --prefix=my_favorite_install_location > > The general rule of thumb on cray's with master (not with older versions > though) is you should be able to > do a ./configure (install location) and you're ready to go, no need for > complicated platform files, etc. > to just build vanilla. > > As you're probably guessing, I'm going to say it works for me, at least up to > 68 slave ranks. > > I do notice there's some glitch with the mapping of the ranks though. The > binding logic seems > to think there's oversubscription of cores even when there should not be. I > had to use the > > --bind-to none > > option on the command line once I asked for more than 22 slave ranks. edison > system has > has 24 cores/node. > > Howard > > > > 2015-06-11 12:10 GMT-06:00 Leiter, Kenneth W CIV USARMY ARL (US) > <kenneth.w.leiter2....@mail.mil <mailto:kenneth.w.leiter2....@mail.mil>>: > I will try on a non-cray machine as well. > > - Ken > > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org > <mailto:users-boun...@open-mpi.org>] On Behalf Of Howard Pritchard > Sent: Thursday, June 11, 2015 12:21 PM > To: Open MPI Users > Subject: Re: [OMPI users] orted segmentation fault in pmix on master > > Hello Ken, > > Could you give the details of the allocation request (qsub args) as well as > the mpirun command line args? I'm trying to reproduce on the nersc system. > > It would be interesting if you have access to a similar size non-cray cluster > if you get the same problems. > > Howard > > > 2015-06-11 9:13 GMT-06:00 Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org> <mailto:r...@open-mpi.org > <mailto:r...@open-mpi.org>> >: > > > I don’t have a Cray, but let me see if I can reproduce this on > something else > > > On Jun 11, 2015, at 7:26 AM, Leiter, Kenneth W CIV USARMY ARL (US) > <kenneth.w.leiter2....@mail.mil <mailto:kenneth.w.leiter2....@mail.mil> > <mailto:kenneth.w.leiter2....@mail.mil > <mailto:kenneth.w.leiter2....@mail.mil>> > wrote: > > > > Hello, > > > > I am attempting to use the openmpi development master for a code > that uses > > dynamic process management (i.e. MPI_Comm_spawn) on our Cray XC40 > at the > > Army Research Laboratory. After reading through the mailing list I > came to > > the conclusion that the master branch is the only hope for getting > this to > > work on the newer Cray machines. > > > > To test I am using the cpi-master.c cpi-worker.c example. The test > works > > when executing on a small number of processors, five or less, but > begins to > > fail with segmentation faults in orted when using more processors. > Even with > > five or fewer processors, I am spreading the computation to more > than one > > node. I am using the cray ugni btl through the alps scheduler. > > > > I get a core file from orted and have the seg fault tracked down to > > pmix_server_process_msgs.c:420 where req->proxy is NULL. I have > tried > > reading the code to understand how this happens, but am unsure. I > do see > > that in the if statement where I take the else branch, the other > branch > > specifically checks "if (NULL == req->proxy)" - however, no such > check is > > done the the else branch. > > > > I have debug output dumped for the failing runs. I can provide the > output > > along with ompi_info output and config.log to anyone who is > interested. > > > > - Ken Leiter > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27094.php > <http://www.open-mpi.org/community/lists/users/2015/06/27094.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27095.php > <http://www.open-mpi.org/community/lists/users/2015/06/27095.php> > > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27103.php > <http://www.open-mpi.org/community/lists/users/2015/06/27103.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27104.php