Hmmm…well, it will generate some output, so keep the system down to two nodes if you can just to minimize the chatter. Add “-mca oob_base_verbose 100” to your cmd line
> On Mar 25, 2015, at 8:45 AM, Mark Santcroos <mark.santcr...@rutgers.edu> > wrote: > > Hi Ralph, > > There is no OMPI in system space and PATH and LD_LIBRARY_PATH look good. > Any suggestion on how to get more relevant debugging info above the table? > > Thanks > > Mark > > >> On 25 Mar 2015, at 16:33 , Ralph Castain <r...@open-mpi.org> wrote: >> >> Hey Mark >> >> Your original error flag indicates that you are picking up a connection from >> some proc built against a different OMPI installation. It’s a very low-level >> check that looks for matching version numbers. Not sure who is trying to >> connect, but that is the problem. >> >> Check you LD_LIBRARY_PATH >> >>> On Mar 25, 2015, at 7:46 AM, Howard Pritchard <hpprit...@gmail.com> wrote: >>> >>> turn off the disable getpwuid. >>> >>> On Mar 25, 2015 8:14 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu> >>> wrote: >>> Hi Howard, >>> >>>> On 25 Mar 2015, at 14:58 , Howard Pritchard <hpprit...@gmail.com> wrote: >>>> How are you building ompi? >>> >>> My configure is rather straightforward: >>> ./configure --prefix=$OMPI_PREFIX --disable-getpwuid >>> >>> Maybe I got spoiled on Hopper/Edison and I need more explicit configuration >>> on BW ... >>> >>>> Also what happens if you use. aprun. >>> >>> Not sure if you meant in combination with mpirun or not, so I'll provide >>> both: >>> >>>> aprun -n2 ./a.out >>> Hello from rank 1, thread 0, on nid16869. (core affinity = 0) >>> Hello from rank 0, thread 0, on nid16868. (core affinity = 0) >>> After sleep from rank 1, thread 0, on nid16869. (core affinity = 0) >>> After sleep from rank 0, thread 0, on nid16868. (core affinity = 0) >>> Application 23791589 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks >>> ~13229, outblocks ~66 >>> >>>> aprun -n2 mpirun ./a.out >>> apstat: error opening /ufs/alps_shared/reservations: No such file or >>> directory >>> apstat: error opening /ufs/alps_shared/reservations: No such file or >>> directory >>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file >>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159 >>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file >>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85 >>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file >>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190 >>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file >>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159 >>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file >>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85 >>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file >>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190 >>> Application 23791590 exit codes: 1 >>> Application 23791590 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks >>> ~9596, outblocks ~478 >>> >>>> I work with ompi on the nersc edison and hopper daily. >>> >>> I use Edison and Hopper too, and there it works for me indeed. >>> >>>> typically i use aprun though. >>> >>> I want to use orte-submit and friends, so I "explicitly" don't want to use >>> aprun. >>> >>>> you definitely dont need to use ccm. >>>> and shouldnt. >>> >>> Depends on the use-case, but happy to leave that out of scope for now :-) >>> >>> Thanks! >>> >>> Mark >>> >>> >>>> >>>> On Mar 25, 2015 6:00 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu> >>>> wrote: >>>> Hi, >>>> >>>> Any users of Open MPI on Blue Waters here? >>>> And then I specifically mean in "native" mode, not inside CCM. >>>> >>>> After configuring and building as I do on other Cray's, mpirun gives me >>>> the following: >>>> [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in >>>> file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803 >>>> [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in >>>> file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803 >>>> >>>> Version is the latest and greatest from git. >>>> >>>> So I'm interested to hear whether people have been successful on Blue >>>> Waters and/or whether the error rings a bell for people. >>>> >>>> Thanks! >>>> >>>> Mark >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/03/26505.php >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/03/26506.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/03/26507.php >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/03/26508.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26510.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26513.php