Oh come on, Howard - before you go dumping more components into the system, let’s explore WHY he hit this problem.
Geez… > On Mar 25, 2015, at 9:16 AM, Howard Pritchard <hpprit...@gmail.com> wrote: > > kind of working fine. I don't like users having to add these kind of > specialized --mca settings > just to get something to work. sounds like time for yet another cray > specific component. > > > > 2015-03-25 10:14 GMT-06:00 Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>>: > It’s working just fine, Howard - we found the problem. > >> On Mar 25, 2015, at 9:12 AM, Howard Pritchard <hpprit...@gmail.com >> <mailto:hpprit...@gmail.com>> wrote: >> >> Mark, >> >> If you're wanting to use the orte-submit feature, you will need to get >> mpirun working. >> >> Could you rerun using the mpirun launch method but with >> >> --mca oob_base_verbose 10 --mca ess_base_verbose 2 >> >> set? >> >> >> Also, you may want to make sure you are using the ipogif0 eth device. This >> can be controlled using the oob_tcp_if_include mca parameter, i.e. >> >> mpirun --mca oob_tcp_if_include ipogif0 >> >> I'm assuming your use case doesn't require connectivity between processes >> running on the compute nodes and some external service in suggesting this >> option. >> >> 2015-03-25 8:14 GMT-06:00 Mark Santcroos <mark.santcr...@rutgers.edu >> <mailto:mark.santcr...@rutgers.edu>>: >> Hi Howard, >> >> > On 25 Mar 2015, at 14:58 , Howard Pritchard <hpprit...@gmail.com >> > <mailto:hpprit...@gmail.com>> wrote: >> > How are you building ompi? >> >> My configure is rather straightforward: >> ./configure --prefix=$OMPI_PREFIX --disable-getpwuid >> >> Maybe I got spoiled on Hopper/Edison and I need more explicit configuration >> on BW ... >> >> > Also what happens if you use. aprun. >> >> Not sure if you meant in combination with mpirun or not, so I'll provide >> both: >> >> > aprun -n2 ./a.out >> Hello from rank 1, thread 0, on nid16869. (core affinity = 0) >> Hello from rank 0, thread 0, on nid16868. (core affinity = 0) >> After sleep from rank 1, thread 0, on nid16869. (core affinity = 0) >> After sleep from rank 0, thread 0, on nid16868. (core affinity = 0) >> Application 23791589 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks >> ~13229, outblocks ~66 >> >> > aprun -n2 mpirun ./a.out >> apstat: error opening /ufs/alps_shared/reservations: No such file or >> directory >> apstat: error opening /ufs/alps_shared/reservations: No such file or >> directory >> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159 >> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85 >> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190 >> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159 >> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85 >> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190 >> Application 23791590 exit codes: 1 >> Application 23791590 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks >> ~9596, outblocks ~478 >> >> > I work with ompi on the nersc edison and hopper daily. >> >> I use Edison and Hopper too, and there it works for me indeed. >> >> > typically i use aprun though. >> >> I want to use orte-submit and friends, so I "explicitly" don't want to use >> aprun. >> >> > you definitely dont need to use ccm. >> > and shouldnt. >> >> Depends on the use-case, but happy to leave that out of scope for now :-) >> >> Thanks! >> >> Mark >> >> >> > >> > On Mar 25, 2015 6:00 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu >> > <mailto:mark.santcr...@rutgers.edu>> wrote: >> > Hi, >> > >> > Any users of Open MPI on Blue Waters here? >> > And then I specifically mean in "native" mode, not inside CCM. >> > >> > After configuring and building as I do on other Cray's, mpirun gives me >> > the following: >> > [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in >> > file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803 >> > [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in >> > file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803 >> > >> > Version is the latest and greatest from git. >> > >> > So I'm interested to hear whether people have been successful on Blue >> > Waters and/or whether the error rings a bell for people. >> > >> > Thanks! >> > >> > Mark >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org <mailto:us...@open-mpi.org> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2015/03/26505.php >> > <http://www.open-mpi.org/community/lists/users/2015/03/26505.php> >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org <mailto:us...@open-mpi.org> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2015/03/26506.php >> > <http://www.open-mpi.org/community/lists/users/2015/03/26506.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26507.php >> <http://www.open-mpi.org/community/lists/users/2015/03/26507.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26520.php >> <http://www.open-mpi.org/community/lists/users/2015/03/26520.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26521.php > <http://www.open-mpi.org/community/lists/users/2015/03/26521.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26522.php