Just committed a potential fix to the trunk - please let me know if it worked 
for you

On May 14, 2014, at 11:44 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi Ralph,
> 
>> Hmmm...well, that's an interesting naming scheme :-)
>> 
>> Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line
>> and let's see what it thinks is happening
> 
> 
> tyr fd1026 105 mpiexec -np 3 --host tyr,sunpc1,linpc1 --mca oob_base_verbose 
> 10 --report-uri - hostname
> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: 
> registering oob components
> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: found 
> loaded component tcp
> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: component 
> tcp register function successful
> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: opening oob 
> components
> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: found loaded 
> component tcp
> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: component tcp 
> open function successful
> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: checking available 
> component tcp
> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Querying component [tcp]
> [tyr.informatik.hs-fulda.de:06877] oob:tcp: component_available called
> [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: 
> V4
> [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: 
> V4
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init creating module 
> for V4 address on interface bge0
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] creating OOB-TCP module for 
> interface bge0
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init adding 
> 193.174.24.39 to our list of V4 connections
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP STARTUP
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] attempting to bind to IPv4 
> port 0
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] assigned IPv4 port 55567
> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Adding component to end
> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Found 1 active transports
> 3170566144.0;tcp://193.174.24.39:55567
> [sunpc1:07690] mca: base: components_register: registering oob components
> [sunpc1:07690] mca: base: components_register: found loaded component tcp
> [sunpc1:07690] mca: base: components_register: component tcp register 
> function successful
> [sunpc1:07690] mca: base: components_open: opening oob components
> [sunpc1:07690] mca: base: components_open: found loaded component tcp
> [sunpc1:07690] mca: base: components_open: component tcp open function 
> successful
> [sunpc1:07690] mca:oob:select: checking available component tcp
> [sunpc1:07690] mca:oob:select: Querying component [tcp]
> [sunpc1:07690] oob:tcp: component_available called
> [sunpc1:07690] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
> [sunpc1:07690] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
> [sunpc1:07690] [[48379,0],1] oob:tcp:init creating module for V4 address on 
> interface nge0
> [sunpc1:07690] [[48379,0],1] creating OOB-TCP module for interface nge0
> [sunpc1:07690] [[48379,0],1] oob:tcp:init adding 193.174.26.210 to our list 
> of V4 connections
> [sunpc1:07690] [[48379,0],1] TCP STARTUP
> [sunpc1:07690] [[48379,0],1] attempting to bind to IPv4 port 0
> [sunpc1:07690] [[48379,0],1] assigned IPv4 port 39616
> [sunpc1:07690] mca:oob:select: Adding component to end
> [sunpc1:07690] mca:oob:select: Found 1 active transports
> [sunpc1:07690] [[48379,0],1]: set_addr to uri 
> 3170566144.0;tcp://193.174.24.39:55567
> [sunpc1:07690] [[48379,0],1]:set_addr checking if peer [[48379,0],0] is 
> reachable via component tcp
> [sunpc1:07690] [[48379,0],1] oob:tcp: working peer [[48379,0],0] address 
> tcp://193.174.24.39:55567
> [sunpc1:07690] [[48379,0],1] UNFOUND KERNEL INDEX -13 FOR ADDRESS 
> 193.174.24.39
> [sunpc1:07690] [[48379,0],1] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - 
> ASSIGNING MODULE AT KINDEX 2 INTERFACE nge0
> [sunpc1:07690] [[48379,0],1] PASSING ADDR 193.174.24.39 TO INTERFACE nge0 AT 
> KERNEL INDEX 2
> [sunpc1:07690] [[48379,0],1]:tcp set addr for peer [[48379,0],0]
> [sunpc1:07690] [[48379,0],1]: peer [[48379,0],0] is reachable via component 
> tcp
> [sunpc1:07690] [[48379,0],1] OOB_SEND: 
> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
> [sunpc1:07690] [[48379,0],1]:tcp:processing set_peer cmd for interface nge0
> [sunpc1:07690] [[48379,0],1] oob:base:send to target [[48379,0],0]
> [sunpc1:07690] [[48379,0],1] oob:tcp:send_nb to peer [[48379,0],0]:10
> [sunpc1:07690] [[48379,0],1] tcp:send_nb to peer [[48379,0],0]
> [sunpc1:07690] 
> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508]
>  post send to [[48379,0],0]
> [sunpc1:07690] 
> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442]
>  processing send to peer 
> [[48379,0],0]:10
> [sunpc1:07690] 
> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476]
>  queue pending to [[48379,0],0]
> [sunpc1:07690] [[48379,0],1] tcp:send_nb: initiating connection to 
> [[48379,0],0]
> [sunpc1:07690] 
> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490]
>  connect to [[48379,0],0]
> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect 
> to proc [[48379,0],0] via interface nge0
> [sunpc1:07690] [[48379,0],1] oob:tcp:peer creating socket to [[48379,0],0]
> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect 
> to proc [[48379,0],0] via interface nge0 on socket 10
> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect 
> to proc [[48379,0],0] on 193.174.24.39:55567 - 0 retries
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: 
> new connection: (15, 0) 193.174.26.210:39617
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working 
> connection (15, 11) 193.174.26.210:39617
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON 
> UNKNOWN INTERFACE
> [sunpc1:07690] [[48379,0],1] waiting for connect completion to [[48379,0],0] 
> - activating send event
> [sunpc1:07690] [[48379,0],1] tcp:send_handler called to send to peer 
> [[48379,0],0]
> [sunpc1:07690] [[48379,0],1] tcp:send_handler CONNECTING
> [sunpc1:07690] [[48379,0],1]:tcp:complete_connect called for peer 
> [[48379,0],0] on socket 10
> [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: sending ack to 
> [[48379,0],0]
> [sunpc1:07690] [[48379,0],1] SEND CONNECT ACK
> [sunpc1:07690] [[48379,0],1] send blocking of 48 bytes to socket 10
> [sunpc1:07690] [[48379,0],1] connect-ack sent to socket 10
> [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: setting read event on 
> connection to [[48379,0],0]
> [linpc1:21511] mca: base: components_register: registering oob components
> [linpc1:21511] mca: base: components_register: found loaded component tcp
> [linpc1:21511] mca: base: components_register: component tcp register 
> function successful
> [linpc1:21511] mca: base: components_open: opening oob components
> [linpc1:21511] mca: base: components_open: found loaded component tcp
> [linpc1:21511] mca: base: components_open: component tcp open function 
> successful
> [linpc1:21511] mca:oob:select: checking available component tcp
> [linpc1:21511] mca:oob:select: Querying component [tcp]
> [linpc1:21511] oob:tcp: component_available called
> 
> [linpc1:21511] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
> [linpc1:21511] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
> [linpc1:21511] [[48379,0],2] oob:tcp:init creating module for V4 address on 
> interface eth0
> [linpc1:21511] [[48379,0],2] creating OOB-TCP module for interface eth0
> [linpc1:21511] [[48379,0],2] oob:tcp:init adding 193.174.26.208 to our list 
> of V4 connections
> [linpc1:21511] [[48379,0],2] TCP STARTUP
> [linpc1:21511] [[48379,0],2] attempting to bind to IPv4 port 0
> [linpc1:21511] [[48379,0],2] assigned IPv4 port 39724
> [linpc1:21511] mca:oob:select: Adding component to end
> [linpc1:21511] mca:oob:select: Found 1 active transports
> [linpc1:21511] [[48379,0],2]: set_addr to uri 
> 3170566144.0;tcp://193.174.24.39:55567
> [linpc1:21511] [[48379,0],2]:set_addr checking if peer [[48379,0],0] is 
> reachable via component tcp
> [linpc1:21511] [[48379,0],2] oob:tcp: working peer [[48379,0],0] address 
> tcp://193.174.24.39:55567
> [linpc1:21511] [[48379,0],2] UNFOUND KERNEL INDEX -13 FOR ADDRESS 
> 193.174.24.39
> [linpc1:21511] [[48379,0],2] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - 
> ASSIGNING MODULE AT KINDEX 2 INTERFACE eth0
> [linpc1:21511] [[48379,0],2] PASSING ADDR 193.174.24.39 TO INTERFACE eth0 AT 
> KERNEL INDEX 2
> [linpc1:21511] [[48379,0],2]:tcp set addr for peer [[48379,0],0]
> [linpc1:21511] [[48379,0],2]: peer [[48379,0],0] is reachable via component 
> tcp
> [linpc1:21511] [[48379,0],2] OOB_SEND: 
> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
> [linpc1:21511] [[48379,0],2]:tcp:processing set_peer cmd for interface eth0
> [linpc1:21511] [[48379,0],2] oob:base:send to target [[48379,0],0]
> [linpc1:21511] [[48379,0],2] oob:tcp:send_nb to peer [[48379,0],0]:10
> [linpc1:21511] [[48379,0],2] tcp:send_nb to peer [[48379,0],0]
> [linpc1:21511] 
> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508]
>  post send to [[48379,0],0]
> [linpc1:21511] 
> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442]
>  processing send to peer 
> [[48379,0],0]:10
> [linpc1:21511] 
> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476]
>  queue pending to [[48379,0],0]
> [linpc1:21511] [[48379,0],2] tcp:send_nb: initiating connection to 
> [[48379,0],0]
> [linpc1:21511] 
> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490]
>  connect to [[48379,0],0]
> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect 
> to proc [[48379,0],0] via interface eth0
> [linpc1:21511] [[48379,0],2] oob:tcp:peer creating socket to [[48379,0],0]
> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect 
> to proc [[48379,0],0] via interface eth0 on socket 9
> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect 
> to proc [[48379,0],0] on 193.174.24.39:55567 - 0 retries
> [linpc1:21511] [[48379,0],2] waiting for connect completion to [[48379,0],0] 
> - activating send event
> [linpc1:21511] [[48379,0],2] tcp:send_handler called to send to peer 
> [[48379,0],0]
> [linpc1:21511] [[48379,0],2] tcp:send_handler CONNECTING
> [linpc1:21511] [[48379,0],2]:tcp:complete_connect called for peer 
> [[48379,0],0] on socket 9
> [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: sending ack to 
> [[48379,0],0]
> [linpc1:21511] [[48379,0],2] SEND CONNECT ACK
> [linpc1:21511] [[48379,0],2] send blocking of 48 bytes to socket 9
> [linpc1:21511] [[48379,0],2] connect-ack sent to socket 9
> [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: setting read event on 
> connection to [[48379,0],0]
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: 
> new connection: (16, 11) 193.174.26.208:53741
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working 
> connection (16, 11) 193.174.26.208:53741
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON 
> UNKNOWN INTERFACE
> ^CKilled by signal 2.
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: 
> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: 
> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target 
> [[48379,0],1]
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer 
> [[48379,0],1]
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target 
> [[48379,0],2]
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer 
> [[48379,0],2]
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
> Killed by signal 2.
> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP SHUTDOWN
> [tyr.informatik.hs-fulda.de:06877] mca: base: close: component tcp closed
> [tyr.informatik.hs-fulda.de:06877] mca: base: close: unloading component tcp
> tyr fd1026 106 
> 
> 
> Thank you very much for your help in advance. Do you need anything else?
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
>> On May 14, 2014, at 9:06 AM, Siegmar Gross 
>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>> 
>>> Hi Ralph,
>>> 
>>>> What are the interfaces on these machines?
>>> 
>>> tyr fd1026 111 ifconfig -a
>>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
>>> index 1
>>>       inet 127.0.0.1 netmask ff000000 
>>> bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>>>       inet 193.174.24.39 netmask ffffffe0 broadcast 193.174.24.63
>>> tyr fd1026 112 
>>> 
>>> 
>>> tyr fd1026 112 ssh sunpc1 ifconfig -a
>>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
>>> index 1
>>>       inet 127.0.0.1 netmask ff000000 
>>> nge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>>>       inet 193.174.26.210 netmask ffffffc0 broadcast 193.174.26.255
>>> tyr fd1026 113 
>>> 
>>> 
>>> tyr fd1026 113 ssh linpc1 /sbin/ifconfig -a
>>> eth0      Link encap:Ethernet  HWaddr 00:14:4F:23:FD:A8  
>>>         inet addr:193.174.26.208  Bcast:193.174.26.255  Mask:255.255.255.192
>>>         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>         RX packets:18052524 errors:127 dropped:0 overruns:0 frame:127
>>>         TX packets:15917888 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:1000 
>>>         RX bytes:4158294157 (3965.6 Mb)  TX bytes:12060556809 (11501.8 Mb)
>>>         Interrupt:23 Base address:0x4000 
>>> 
>>> eth1      Link encap:Ethernet  HWaddr 00:14:4F:23:FD:A9  
>>>         BROADCAST MULTICAST  MTU:1500  Metric:1
>>>         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:1000 
>>>         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>>>         Interrupt:45 Base address:0xa000 
>>> 
>>> lo        Link encap:Local Loopback  
>>>         inet addr:127.0.0.1  Mask:255.0.0.0
>>>         UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>         RX packets:1083 errors:0 dropped:0 overruns:0 frame:0
>>>         TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0
>>>         collisions:0 txqueuelen:0 
>>>         RX bytes:329323 (321.6 Kb)  TX bytes:329323 (321.6 Kb)
>>> 
>>> tyr fd1026 114 
>>> 
>>> 
>>> Do you need something else?
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> 
>>> 
>>> 
>>>> On May 14, 2014, at 7:45 AM, Siegmar Gross 
>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I just installed openmpi-1.8.2a1r31742 on my machines (Solaris 10
>>>>> Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
>>>>> Sun C5.12 and still have the following problem.
>>>>> 
>>>>> tyr fd1026 102 which mpiexec
>>>>> /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec
>>>>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>>>>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
>>>>> REQUEST ON UNKNOWN INTERFACE
>>>>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
>>>>> REQUEST ON UNKNOWN INTERFACE
>>>>> ^CKilled by signal 2.
>>>>> Killed by signal 2.
>>>>> tyr fd1026 104 
>>>>> 
>>>>> 
>>>>> The command works fine with openmpi-1.6.6rc1.
>>>>> 
>>>>> tyr fd1026 102 which mpiexec
>>>>> /usr/local/openmpi-1.6.6_64_cc/bin/mpiexec
>>>>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>>>>> tyr.informatik.hs-fulda.de
>>>>> linpc1
>>>>> sunpc1
>>>>> tyr fd1026 104 
>>>>> 
>>>>> 
>>>>> I have reported the problem before and I would be grateful, if
>>>>> somebody could solve it. Please let me know if I can provide any
>>>>> other information.
>>>>> 
>>>>> 
>>>>> Kind regards
>>>>> 
>>>>> Siegmar
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to