Hi Ralph, > Hmmm...well, that's an interesting naming scheme :-) > > Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line > and let's see what it thinks is happening
tyr fd1026 105 mpiexec -np 3 --host tyr,sunpc1,linpc1 --mca oob_base_verbose 10 --report-uri - hostname [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: registering oob components [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: found loaded component tcp [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: component tcp register function successful [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: opening oob components [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: found loaded component tcp [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: component tcp open function successful [tyr.informatik.hs-fulda.de:06877] mca:oob:select: checking available component tcp [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Querying component [tcp] [tyr.informatik.hs-fulda.de:06877] oob:tcp: component_available called [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init creating module for V4 address on interface bge0 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] creating OOB-TCP module for interface bge0 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init adding 193.174.24.39 to our list of V4 connections [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP STARTUP [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] attempting to bind to IPv4 port 0 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] assigned IPv4 port 55567 [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Adding component to end [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Found 1 active transports 3170566144.0;tcp://193.174.24.39:55567 [sunpc1:07690] mca: base: components_register: registering oob components [sunpc1:07690] mca: base: components_register: found loaded component tcp [sunpc1:07690] mca: base: components_register: component tcp register function successful [sunpc1:07690] mca: base: components_open: opening oob components [sunpc1:07690] mca: base: components_open: found loaded component tcp [sunpc1:07690] mca: base: components_open: component tcp open function successful [sunpc1:07690] mca:oob:select: checking available component tcp [sunpc1:07690] mca:oob:select: Querying component [tcp] [sunpc1:07690] oob:tcp: component_available called [sunpc1:07690] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [sunpc1:07690] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [sunpc1:07690] [[48379,0],1] oob:tcp:init creating module for V4 address on interface nge0 [sunpc1:07690] [[48379,0],1] creating OOB-TCP module for interface nge0 [sunpc1:07690] [[48379,0],1] oob:tcp:init adding 193.174.26.210 to our list of V4 connections [sunpc1:07690] [[48379,0],1] TCP STARTUP [sunpc1:07690] [[48379,0],1] attempting to bind to IPv4 port 0 [sunpc1:07690] [[48379,0],1] assigned IPv4 port 39616 [sunpc1:07690] mca:oob:select: Adding component to end [sunpc1:07690] mca:oob:select: Found 1 active transports [sunpc1:07690] [[48379,0],1]: set_addr to uri 3170566144.0;tcp://193.174.24.39:55567 [sunpc1:07690] [[48379,0],1]:set_addr checking if peer [[48379,0],0] is reachable via component tcp [sunpc1:07690] [[48379,0],1] oob:tcp: working peer [[48379,0],0] address tcp://193.174.24.39:55567 [sunpc1:07690] [[48379,0],1] UNFOUND KERNEL INDEX -13 FOR ADDRESS 193.174.24.39 [sunpc1:07690] [[48379,0],1] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - ASSIGNING MODULE AT KINDEX 2 INTERFACE nge0 [sunpc1:07690] [[48379,0],1] PASSING ADDR 193.174.24.39 TO INTERFACE nge0 AT KERNEL INDEX 2 [sunpc1:07690] [[48379,0],1]:tcp set addr for peer [[48379,0],0] [sunpc1:07690] [[48379,0],1]: peer [[48379,0],0] is reachable via component tcp [sunpc1:07690] [[48379,0],1] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199 [sunpc1:07690] [[48379,0],1]:tcp:processing set_peer cmd for interface nge0 [sunpc1:07690] [[48379,0],1] oob:base:send to target [[48379,0],0] [sunpc1:07690] [[48379,0],1] oob:tcp:send_nb to peer [[48379,0],0]:10 [sunpc1:07690] [[48379,0],1] tcp:send_nb to peer [[48379,0],0] [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508] post send to [[48379,0],0] [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442] processing send to peer [[48379,0],0]:10 [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476] queue pending to [[48379,0],0] [sunpc1:07690] [[48379,0],1] tcp:send_nb: initiating connection to [[48379,0],0] [sunpc1:07690] [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490] connect to [[48379,0],0] [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface nge0 [sunpc1:07690] [[48379,0],1] oob:tcp:peer creating socket to [[48379,0],0] [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface nge0 on socket 10 [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] on 193.174.24.39:55567 - 0 retries [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: new connection: (15, 0) 193.174.26.210:39617 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working connection (15, 11) 193.174.26.210:39617 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON UNKNOWN INTERFACE [sunpc1:07690] [[48379,0],1] waiting for connect completion to [[48379,0],0] - activating send event [sunpc1:07690] [[48379,0],1] tcp:send_handler called to send to peer [[48379,0],0] [sunpc1:07690] [[48379,0],1] tcp:send_handler CONNECTING [sunpc1:07690] [[48379,0],1]:tcp:complete_connect called for peer [[48379,0],0] on socket 10 [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: sending ack to [[48379,0],0] [sunpc1:07690] [[48379,0],1] SEND CONNECT ACK [sunpc1:07690] [[48379,0],1] send blocking of 48 bytes to socket 10 [sunpc1:07690] [[48379,0],1] connect-ack sent to socket 10 [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: setting read event on connection to [[48379,0],0] [linpc1:21511] mca: base: components_register: registering oob components [linpc1:21511] mca: base: components_register: found loaded component tcp [linpc1:21511] mca: base: components_register: component tcp register function successful [linpc1:21511] mca: base: components_open: opening oob components [linpc1:21511] mca: base: components_open: found loaded component tcp [linpc1:21511] mca: base: components_open: component tcp open function successful [linpc1:21511] mca:oob:select: checking available component tcp [linpc1:21511] mca:oob:select: Querying component [tcp] [linpc1:21511] oob:tcp: component_available called [linpc1:21511] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [linpc1:21511] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [linpc1:21511] [[48379,0],2] oob:tcp:init creating module for V4 address on interface eth0 [linpc1:21511] [[48379,0],2] creating OOB-TCP module for interface eth0 [linpc1:21511] [[48379,0],2] oob:tcp:init adding 193.174.26.208 to our list of V4 connections [linpc1:21511] [[48379,0],2] TCP STARTUP [linpc1:21511] [[48379,0],2] attempting to bind to IPv4 port 0 [linpc1:21511] [[48379,0],2] assigned IPv4 port 39724 [linpc1:21511] mca:oob:select: Adding component to end [linpc1:21511] mca:oob:select: Found 1 active transports [linpc1:21511] [[48379,0],2]: set_addr to uri 3170566144.0;tcp://193.174.24.39:55567 [linpc1:21511] [[48379,0],2]:set_addr checking if peer [[48379,0],0] is reachable via component tcp [linpc1:21511] [[48379,0],2] oob:tcp: working peer [[48379,0],0] address tcp://193.174.24.39:55567 [linpc1:21511] [[48379,0],2] UNFOUND KERNEL INDEX -13 FOR ADDRESS 193.174.24.39 [linpc1:21511] [[48379,0],2] PEER [[48379,0],0] MAY BE REACHABLE BY ROUTING - ASSIGNING MODULE AT KINDEX 2 INTERFACE eth0 [linpc1:21511] [[48379,0],2] PASSING ADDR 193.174.24.39 TO INTERFACE eth0 AT KERNEL INDEX 2 [linpc1:21511] [[48379,0],2]:tcp set addr for peer [[48379,0],0] [linpc1:21511] [[48379,0],2]: peer [[48379,0],0] is reachable via component tcp [linpc1:21511] [[48379,0],2] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199 [linpc1:21511] [[48379,0],2]:tcp:processing set_peer cmd for interface eth0 [linpc1:21511] [[48379,0],2] oob:base:send to target [[48379,0],0] [linpc1:21511] [[48379,0],2] oob:tcp:send_nb to peer [[48379,0],0]:10 [linpc1:21511] [[48379,0],2] tcp:send_nb to peer [[48379,0],0] [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508] post send to [[48379,0],0] [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442] processing send to peer [[48379,0],0]:10 [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476] queue pending to [[48379,0],0] [linpc1:21511] [[48379,0],2] tcp:send_nb: initiating connection to [[48379,0],0] [linpc1:21511] [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490] connect to [[48379,0],0] [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface eth0 [linpc1:21511] [[48379,0],2] oob:tcp:peer creating socket to [[48379,0],0] [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] via interface eth0 on socket 9 [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to connect to proc [[48379,0],0] on 193.174.24.39:55567 - 0 retries [linpc1:21511] [[48379,0],2] waiting for connect completion to [[48379,0],0] - activating send event [linpc1:21511] [[48379,0],2] tcp:send_handler called to send to peer [[48379,0],0] [linpc1:21511] [[48379,0],2] tcp:send_handler CONNECTING [linpc1:21511] [[48379,0],2]:tcp:complete_connect called for peer [[48379,0],0] on socket 9 [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: sending ack to [[48379,0],0] [linpc1:21511] [[48379,0],2] SEND CONNECT ACK [linpc1:21511] [[48379,0],2] send blocking of 48 bytes to socket 9 [linpc1:21511] [[48379,0],2] connect-ack sent to socket 9 [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: setting read event on connection to [[48379,0],0] [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] mca_oob_tcp_listen_thread: new connection: (16, 11) 193.174.26.208:53741 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: working connection (16, 11) 193.174.26.208:53741 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON UNKNOWN INTERFACE ^CKilled by signal 2. [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199 [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target [[48379,0],1] [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer [[48379,0],1] [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target [[48379,0],2] [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown peer [[48379,0],2] [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP Killed by signal 2. [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP SHUTDOWN [tyr.informatik.hs-fulda.de:06877] mca: base: close: component tcp closed [tyr.informatik.hs-fulda.de:06877] mca: base: close: unloading component tcp tyr fd1026 106 Thank you very much for your help in advance. Do you need anything else? Kind regards Siegmar > On May 14, 2014, at 9:06 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi Ralph, > > > >> What are the interfaces on these machines? > > > > tyr fd1026 111 ifconfig -a > > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 > > index 1 > > inet 127.0.0.1 netmask ff000000 > > bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 > > inet 193.174.24.39 netmask ffffffe0 broadcast 193.174.24.63 > > tyr fd1026 112 > > > > > > tyr fd1026 112 ssh sunpc1 ifconfig -a > > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 > > index 1 > > inet 127.0.0.1 netmask ff000000 > > nge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 > > inet 193.174.26.210 netmask ffffffc0 broadcast 193.174.26.255 > > tyr fd1026 113 > > > > > > tyr fd1026 113 ssh linpc1 /sbin/ifconfig -a > > eth0 Link encap:Ethernet HWaddr 00:14:4F:23:FD:A8 > > inet addr:193.174.26.208 Bcast:193.174.26.255 > > Mask:255.255.255.192 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:18052524 errors:127 dropped:0 overruns:0 frame:127 > > TX packets:15917888 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:4158294157 (3965.6 Mb) TX bytes:12060556809 (11501.8 Mb) > > Interrupt:23 Base address:0x4000 > > > > eth1 Link encap:Ethernet HWaddr 00:14:4F:23:FD:A9 > > BROADCAST MULTICAST MTU:1500 Metric:1 > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > Interrupt:45 Base address:0xa000 > > > > lo Link encap:Local Loopback > > inet addr:127.0.0.1 Mask:255.0.0.0 > > UP LOOPBACK RUNNING MTU:16436 Metric:1 > > RX packets:1083 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:0 > > RX bytes:329323 (321.6 Kb) TX bytes:329323 (321.6 Kb) > > > > tyr fd1026 114 > > > > > > Do you need something else? > > > > > > Kind regards > > > > Siegmar > > > > > > > > > >> On May 14, 2014, at 7:45 AM, Siegmar Gross > >> <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> Hi, > >>> > >>> I just installed openmpi-1.8.2a1r31742 on my machines (Solaris 10 > >>> Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with > >>> Sun C5.12 and still have the following problem. > >>> > >>> tyr fd1026 102 which mpiexec > >>> /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec > >>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname > >>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION > >>> REQUEST ON UNKNOWN INTERFACE > >>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION > >>> REQUEST ON UNKNOWN INTERFACE > >>> ^CKilled by signal 2. > >>> Killed by signal 2. > >>> tyr fd1026 104 > >>> > >>> > >>> The command works fine with openmpi-1.6.6rc1. > >>> > >>> tyr fd1026 102 which mpiexec > >>> /usr/local/openmpi-1.6.6_64_cc/bin/mpiexec > >>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname > >>> tyr.informatik.hs-fulda.de > >>> linpc1 > >>> sunpc1 > >>> tyr fd1026 104 > >>> > >>> > >>> I have reported the problem before and I would be grateful, if > >>> somebody could solve it. Please let me know if I can provide any > >>> other information. > >>> > >>> > >>> Kind regards > >>> > >>> Siegmar > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users >