Hmmm…well, it will generate some output, so keep the system down to two nodes 
if you can just to minimize the chatter. Add “-mca oob_base_verbose 100” to 
your cmd line

> On Mar 25, 2015, at 8:45 AM, Mark Santcroos <mark.santcr...@rutgers.edu> 
> wrote:
> 
> Hi Ralph,
> 
> There is no OMPI in system space and PATH and LD_LIBRARY_PATH look good.
> Any suggestion on how to get more relevant debugging info above the table?
> 
> Thanks
> 
> Mark
> 
> 
>> On 25 Mar 2015, at 16:33 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>> Hey Mark
>> 
>> Your original error flag indicates that you are picking up a connection from 
>> some proc built against a different OMPI installation. It’s a very low-level 
>> check that looks for matching version numbers. Not sure who is trying to 
>> connect, but that is the problem.
>> 
>> Check you LD_LIBRARY_PATH
>> 
>>> On Mar 25, 2015, at 7:46 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
>>> 
>>> turn off the disable getpwuid.
>>> 
>>> On Mar 25, 2015 8:14 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu> 
>>> wrote:
>>> Hi Howard,
>>> 
>>>> On 25 Mar 2015, at 14:58 , Howard Pritchard <hpprit...@gmail.com> wrote:
>>>> How are you building ompi?
>>> 
>>> My configure is rather straightforward:
>>> ./configure --prefix=$OMPI_PREFIX --disable-getpwuid
>>> 
>>> Maybe I got spoiled on Hopper/Edison and I need more explicit configuration 
>>> on BW ...
>>> 
>>>> Also what happens if you use. aprun.
>>> 
>>> Not sure if you meant in combination with mpirun or not, so I'll provide 
>>> both:
>>> 
>>>> aprun -n2 ./a.out
>>> Hello from rank 1, thread 0, on nid16869. (core affinity = 0)
>>> Hello from rank 0, thread 0, on nid16868. (core affinity = 0)
>>> After sleep from rank 1, thread 0, on nid16869. (core affinity = 0)
>>> After sleep from rank 0, thread 0, on nid16868. (core affinity = 0)
>>> Application 23791589 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks 
>>> ~13229, outblocks ~66
>>> 
>>>> aprun -n2 mpirun ./a.out
>>> apstat: error opening /ufs/alps_shared/reservations: No such file or 
>>> directory
>>> apstat: error opening /ufs/alps_shared/reservations: No such file or 
>>> directory
>>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159
>>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85
>>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190
>>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159
>>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85
>>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190
>>> Application 23791590 exit codes: 1
>>> Application 23791590 resources: utime ~0s, stime ~2s, Rss ~27304, inblocks 
>>> ~9596, outblocks ~478
>>> 
>>>> I work with ompi on the nersc edison and hopper daily.
>>> 
>>> I use Edison and Hopper too, and there it works for me indeed.
>>> 
>>>> typically i use aprun though.
>>> 
>>> I want to use orte-submit and friends, so I "explicitly" don't want to use 
>>> aprun.
>>> 
>>>> you definitely dont need to use ccm.
>>>> and shouldnt.
>>> 
>>> Depends on the use-case, but happy to leave that out of scope for now :-)
>>> 
>>> Thanks!
>>> 
>>> Mark
>>> 
>>> 
>>>> 
>>>> On Mar 25, 2015 6:00 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu> 
>>>> wrote:
>>>> Hi,
>>>> 
>>>> Any users of Open MPI on Blue Waters here?
>>>> And then I specifically mean in "native" mode, not inside CCM.
>>>> 
>>>> After configuring and building as I do on other Cray's, mpirun gives me 
>>>> the following:
>>>> [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in 
>>>> file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
>>>> [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in 
>>>> file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
>>>> 
>>>> Version is the latest and greatest from git.
>>>> 
>>>> So I'm interested to hear whether people have been successful on Blue 
>>>> Waters and/or whether the error rings a bell for people.
>>>> 
>>>> Thanks!
>>>> 
>>>> Mark
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/03/26505.php
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/03/26506.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/03/26507.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/03/26508.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26510.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26513.php

Reply via email to