Hmmm…okay, sorry to keep drilling down here, but let’s try adding “-mca 
sec_base_verbose 100” now

> On Mar 25, 2015, at 8:51 AM, Mark Santcroos <mark.santcr...@rutgers.edu> 
> wrote:
> 
> marksant@nid25257:~> /u/sciteam/marksant/openmpi/installation/bin/mpirun -mca 
> oob_base_verbose 100 ./a.out 
> [nid25257:09350] mca: base: components_register: registering oob components
> [nid25257:09350] mca: base: components_register: found loaded component usock
> [nid25257:09350] mca: base: components_register: component usock register 
> function successful
> [nid25257:09350] mca: base: components_register: found loaded component alps
> [nid25257:09350] mca: base: components_register: component alps register 
> function successful
> [nid25257:09350] mca: base: components_register: found loaded component ud
> [nid25257:09350] mca: base: components_register: component ud register 
> function successful
> [nid25257:09350] mca: base: components_register: found loaded component tcp
> [nid25257:09350] mca: base: components_register: component tcp register 
> function successful
> [nid25257:09350] mca: base: components_open: opening oob components
> [nid25257:09350] mca: base: components_open: found loaded component usock
> [nid25257:09350] mca: base: components_open: component usock open function 
> successful
> [nid25257:09350] mca: base: components_open: found loaded component alps
> [nid25257:09350] mca: base: components_open: component alps open function 
> successful
> [nid25257:09350] mca: base: components_open: found loaded component ud
> [nid25257:09350] mca: base: components_open: component ud open function 
> successful
> [nid25257:09350] mca: base: components_open: found loaded component tcp
> [nid25257:09350] mca: base: components_open: component tcp open function 
> successful
> [nid25257:09350] mca:oob:select: checking available component usock
> [nid25257:09350] mca:oob:select: Querying component [usock]
> [nid25257:09350] oob:usock: component_available called
> [nid25257:09350] [[8913,0],0] USOCK STARTUP
> [nid25257:09350] SUNPATH: 
> /var/tmp/openmpi-sessions-45504@nid25257_0/8913/0/usock
> [nid25257:09350] [[8913,0],0] START USOCK LISTENING ON 
> /var/tmp/openmpi-sessions-45504@nid25257_0/8913/0/usock
> [nid25257:09350] mca:oob:select: Adding component to end
> [nid25257:09350] mca:oob:select: checking available component alps
> [nid25257:09350] mca:oob:select: Querying component [alps]
> [nid25257:09350] mca:oob:select: Skipping component [alps] - no available 
> interfaces
> [nid25257:09350] mca:oob:select: checking available component ud
> [nid25257:09350] mca:oob:select: Querying component [ud]
> [nid25257:09350] oob:ud: component_available called
> [nid25257:09350] [[8913,0],0] oob:ud:component_init no devices found
> [nid25257:09350] mca:oob:select: Skipping component [ud] - failed to startup
> [nid25257:09350] mca:oob:select: checking available component tcp
> [nid25257:09350] mca:oob:select: Querying component [tcp]
> [nid25257:09350] oob:tcp: component_available called
> [nid25257:09350] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
> [nid25257:09350] [[8913,0],0] oob:tcp:init rejecting loopback interface lo
> [nid25257:09350] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4
> [nid25257:09350] [[8913,0],0] oob:tcp:init rejecting loopback interface lo
> [nid25257:09350] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4
> [nid25257:09350] [[8913,0],0] oob:tcp:init adding 10.128.99.112 to our list 
> of V4 connections
> [nid25257:09350] [[8913,0],0] TCP STARTUP
> [nid25257:09350] [[8913,0],0] attempting to bind to IPv4 port 0
> [nid25257:09350] [[8913,0],0] assigned IPv4 port 35917
> [nid25257:09350] mca:oob:select: Adding component to end
> [nid25257:09350] mca:oob:select: Found 2 active transports
> [nid25257:09350] [[8913,0],0] mca_oob_tcp_listen_thread: new connection: (16, 
> 0) 10.128.69.144:46745
> [nid25257:09350] [[8913,0],0] connection_handler: working connection (16, 2) 
> 10.128.69.144:46745
> [nid25257:09350] [[8913,0],0] accept_connection: 10.128.69.144:46745
> [nid25257:09350] [[8913,0],0]:tcp:recv:handler called
> [nid25257:09350] [[8913,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 16
> [nid25257:09350] [[8913,0],0] waiting for connect ack from UNKNOWN
> [nid25257:09350] [[8913,0],0] connect ack received from UNKNOWN
> [nid25257:09350] [[8913,0],0] connect-ack recvd from UNKNOWN
> [nid25257:09350] [[8913,0],0] mca_oob_tcp_recv_connect: connection from new 
> peer
> [nid25257:09350] [[8913,0],0] connect-ack header from [[8913,0],2] is okay
> [nid25257:09350] [[8913,0],0] waiting for connect ack from [[8913,0],2]
> [nid25257:09350] [[8913,0],0] connect ack received from [[8913,0],2]
> [nid25257:09350] [[8913,0],0] connect-ack version from [[8913,0],2] matches 
> ours
> [nid25257:09350] [[8913,0],0] ORTE_ERROR_LOG: Authentication failed in file 
> ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
> [nid25257:09350] [[8913,0],0] mca_oob_tcp_listen_thread: new connection: (17, 
> 11) 10.128.69.143:33434
> [nid25257:09350] [[8913,0],0] connection_handler: working connection (17, 0) 
> 10.128.69.143:33434
> [nid25257:09350] [[8913,0],0] accept_connection: 10.128.69.143:33434
> [nid25257:09350] [[8913,0],0]:tcp:recv:handler called
> [nid25257:09350] [[8913,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 17
> [nid25257:09350] [[8913,0],0] waiting for connect ack from UNKNOWN
> [nid25257:09350] [[8913,0],0] connect ack received from UNKNOWN
> [nid25257:09350] [[8913,0],0] connect-ack recvd from UNKNOWN
> [nid25257:09350] [[8913,0],0] mca_oob_tcp_recv_connect: connection from new 
> peer
> [nid25257:09350] [[8913,0],0] connect-ack header from [[8913,0],1] is okay
> [nid25257:09350] [[8913,0],0] waiting for connect ack from [[8913,0],1]
> [nid25257:09350] [[8913,0],0] connect ack received from [[8913,0],1]
> [nid25257:09350] [[8913,0],0] connect-ack version from [[8913,0],1] matches 
> ours
> [nid25257:09350] [[8913,0],0] ORTE_ERROR_LOG: Authentication failed in file 
> ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
> 
> 
>> On 25 Mar 2015, at 16:49 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>> Hmmm…well, it will generate some output, so keep the system down to two 
>> nodes if you can just to minimize the chatter. Add “-mca oob_base_verbose 
>> 100” to your cmd line
>> 
>>> On Mar 25, 2015, at 8:45 AM, Mark Santcroos <mark.santcr...@rutgers.edu> 
>>> wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> There is no OMPI in system space and PATH and LD_LIBRARY_PATH look good.
>>> Any suggestion on how to get more relevant debugging info above the table?
>>> 
>>> Thanks
>>> 
>>> Mark
>>> 
>>> 
>>>> On 25 Mar 2015, at 16:33 , Ralph Castain <r...@open-mpi.org> wrote:
>>>> 
>>>> Hey Mark
>>>> 
>>>> Your original error flag indicates that you are picking up a connection 
>>>> from some proc built against a different OMPI installation. It’s a very 
>>>> low-level check that looks for matching version numbers. Not sure who is 
>>>> trying to connect, but that is the problem.
>>>> 
>>>> Check you LD_LIBRARY_PATH
>>>> 
>>>>> On Mar 25, 2015, at 7:46 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
>>>>> 
>>>>> turn off the disable getpwuid.
>>>>> 
>>>>> On Mar 25, 2015 8:14 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu> 
>>>>> wrote:
>>>>> Hi Howard,
>>>>> 
>>>>>> On 25 Mar 2015, at 14:58 , Howard Pritchard <hpprit...@gmail.com> wrote:
>>>>>> How are you building ompi?
>>>>> 
>>>>> My configure is rather straightforward:
>>>>> ./configure --prefix=$OMPI_PREFIX --disable-getpwuid
>>>>> 
>>>>> Maybe I got spoiled on Hopper/Edison and I need more explicit 
>>>>> configuration on BW ...
>>>>> 
>>>>>> Also what happens if you use. aprun.
>>>>> 
>>>>> Not sure if you meant in combination with mpirun or not, so I'll provide 
>>>>> both:
>>>>> 
>>>>>> aprun -n2 ./a.out
>>>>> Hello from rank 1, thread 0, on nid16869. (core affinity = 0)
>>>>> Hello from rank 0, thread 0, on nid16868. (core affinity = 0)
>>>>> After sleep from rank 1, thread 0, on nid16869. (core affinity = 0)
>>>>> After sleep from rank 0, thread 0, on nid16868. (core affinity = 0)
>>>>> Application 23791589 resources: utime ~0s, stime ~2s, Rss ~27304, 
>>>>> inblocks ~13229, outblocks ~66
>>>>> 
>>>>>> aprun -n2 mpirun ./a.out
>>>>> apstat: error opening /ufs/alps_shared/reservations: No such file or 
>>>>> directory
>>>>> apstat: error opening /ufs/alps_shared/reservations: No such file or 
>>>>> directory
>>>>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>>>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159
>>>>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>>>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85
>>>>> [nid16868:17876] [[699,0],0] ORTE_ERROR_LOG: File open failure in file 
>>>>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190
>>>>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>>>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 159
>>>>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>>>>> ../../../../../orte/mca/ras/tm/ras_tm_module.c at line 85
>>>>> [nid16869:17034] [[9344,0],0] ORTE_ERROR_LOG: File open failure in file 
>>>>> ../../../../orte/mca/ras/base/ras_base_allocate.c at line 190
>>>>> Application 23791590 exit codes: 1
>>>>> Application 23791590 resources: utime ~0s, stime ~2s, Rss ~27304, 
>>>>> inblocks ~9596, outblocks ~478
>>>>> 
>>>>>> I work with ompi on the nersc edison and hopper daily.
>>>>> 
>>>>> I use Edison and Hopper too, and there it works for me indeed.
>>>>> 
>>>>>> typically i use aprun though.
>>>>> 
>>>>> I want to use orte-submit and friends, so I "explicitly" don't want to 
>>>>> use aprun.
>>>>> 
>>>>>> you definitely dont need to use ccm.
>>>>>> and shouldnt.
>>>>> 
>>>>> Depends on the use-case, but happy to leave that out of scope for now :-)
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Mark
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On Mar 25, 2015 6:00 AM, "Mark Santcroos" <mark.santcr...@rutgers.edu> 
>>>>>> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Any users of Open MPI on Blue Waters here?
>>>>>> And then I specifically mean in "native" mode, not inside CCM.
>>>>>> 
>>>>>> After configuring and building as I do on other Cray's, mpirun gives me 
>>>>>> the following:
>>>>>> [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in 
>>>>>> file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
>>>>>> [nid25263:31700] [[23896,0],0] ORTE_ERROR_LOG: Authentication failed in 
>>>>>> file ../../../../../orte/mca/oob/tcp/oob_tcp_connection.c at line 803
>>>>>> 
>>>>>> Version is the latest and greatest from git.
>>>>>> 
>>>>>> So I'm interested to hear whether people have been successful on Blue 
>>>>>> Waters and/or whether the error rings a bell for people.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Mark
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26505.php
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26506.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26507.php
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26508.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/03/26510.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/03/26513.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26514.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26515.php

Reply via email to