This bug should be fixed in tonight's tarball, BTW.

On May 15, 2014, at 9:19 AM, Ralph Castain <r...@open-mpi.org> wrote:

> It is an unrelated bug introduced by a different commit - causing mpirun to 
> segfault upon termination. The fact that you got the hostname to run 
> indicates that this original fix works, so at least we know the connection 
> logic is now okay.
> 
> Thanks
> Ralph
> 
> 
> On May 15, 2014, at 3:40 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
>> Hi Ralph,
>> 
>>> Just committed a potential fix to the trunk - please let me know
>>> if it worked for you
>> 
>> Now I get the hostnames but also a segmentation fault.
>> 
>> tyr fd1026 101 which mpiexec
>> /usr/local/openmpi-1.9_64_cc/bin/mpiexec
>> tyr fd1026 102 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>> tyr.informatik.hs-fulda.de
>> linpc1
>> sunpc1
>> [tyr:22835] *** Process received signal ***
>> [tyr:22835] Signal: Segmentation Fault (11)
>> [tyr:22835] Signal code: Address not mapped (1)
>> [tyr:22835] Failing at address: ffffffff7bf16de0
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x1c
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x183960
>> /lib/sparcv9/libc.so.1:0xd8b98
>> /lib/sparcv9/libc.so.1:0xcc70c
>> /lib/sparcv9/libc.so.1:0xcc918
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1ce0e8
>>  [ Signal 2125151224 (?)]
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1ccde4
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_libevent2021_event_del+0x88
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_libevent2021_event_base_free+0x154
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:0x1bb9e8
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:mca_base_framework_close+0x1a0
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-pal.so.0.0.0:opal_finalize+0xcc
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/lib64/libopen-rte.so.0.0.0:orte_finalize+0x168
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun:orterun+0x23e0
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun:main+0x24
>> /export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun:_start+0x12c
>> [tyr:22835] *** End of error message ***
>> Segmentation fault
>> tyr fd1026 103 ompi_info | grep "revision:"
>>  Open MPI repo revision: r31769
>>  Open RTE repo revision: r31769
>>      OPAL repo revision: r31769
>> tyr fd1026 104 
>> 
>> 
>> 
>> I get the following output in "dbx".
>> 
>> tyr fd1026 104 /opt/solstudio12.3/bin/sparcv9/dbx 
>> /usr/local/openmpi-1.9_64_cc/bin/mpiexec 
>> For information about new features see `help changes'
>> To remove this message, put `dbxenv suppress_startup_message 7.9' in your 
>> .dbxrc
>> Reading mpiexec
>> Reading ld.so.1
>> Reading libopen-rte.so.0.0.0
>> Reading libopen-pal.so.0.0.0
>> Reading libsendfile.so.1
>> Reading libpicl.so.1
>> Reading libkstat.so.1
>> Reading liblgrp.so.1
>> Reading libsocket.so.1
>> Reading libnsl.so.1
>> Reading librt.so.1
>> Reading libm.so.2
>> Reading libthread.so.1
>> Reading libc.so.1
>> Reading libdoor.so.1
>> Reading libaio.so.1
>> Reading libmd.so.1
>> (dbx) run -np 3 --host tyr,sunpc1,linpc1 hostname
>> Running: mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname 
>> (process id 23328)
>> Reading libc_psr.so.1
>> Reading mca_shmem_mmap.so
>> Reading libmp.so.2
>> Reading libscf.so.1
>> Reading libuutil.so.1
>> Reading libgen.so.1
>> Reading mca_shmem_posix.so
>> Reading mca_shmem_sysv.so
>> Reading mca_sec_basic.so
>> Reading mca_ess_env.so
>> Reading mca_ess_hnp.so
>> Reading mca_ess_singleton.so
>> Reading mca_ess_tool.so
>> Reading mca_pstat_test.so
>> Reading mca_state_app.so
>> Reading mca_state_hnp.so
>> Reading mca_state_novm.so
>> Reading mca_state_orted.so
>> Reading mca_state_staged_hnp.so
>> Reading mca_state_staged_orted.so
>> Reading mca_state_tool.so
>> Reading mca_errmgr_default_app.so
>> Reading mca_errmgr_default_hnp.so
>> Reading mca_errmgr_default_orted.so
>> Reading mca_errmgr_default_tool.so
>> Reading mca_plm_isolated.so
>> Reading mca_plm_rsh.so
>> Reading mca_oob_tcp.so
>> Reading mca_rml_oob.so
>> Reading mca_routed_binomial.so
>> Reading mca_routed_debruijn.so
>> Reading mca_routed_direct.so
>> Reading mca_routed_radix.so
>> Reading mca_dstore_hash.so
>> Reading mca_grpcomm_bad.so
>> Reading mca_ras_simulator.so
>> Reading mca_rmaps_lama.so
>> Reading mca_rmaps_mindist.so
>> Reading mca_rmaps_ppr.so
>> Reading mca_rmaps_rank_file.so
>> Reading mca_rmaps_resilient.so
>> Reading mca_rmaps_round_robin.so
>> Reading mca_rmaps_seq.so
>> Reading mca_rmaps_staged.so
>> Reading mca_odls_default.so
>> Reading mca_rtc_hwloc.so
>> Reading mca_iof_hnp.so
>> Reading mca_iof_mr_hnp.so
>> Reading mca_iof_mr_orted.so
>> Reading mca_iof_orted.so
>> Reading mca_iof_tool.so
>> Reading mca_filem_raw.so
>> Reading mca_dfs_app.so
>> Reading mca_dfs_orted.so
>> Reading mca_dfs_test.so
>> tyr.informatik.hs-fulda.de
>> linpc1
>> sunpc1
>> t@1 (l@1) signal SEGV (no mapping at the fault address) in 
>> event_queue_remove at 0xffffffff7e9ce0e8
>> 0xffffffff7e9ce0e8: event_queue_remove+0x01a8:  stx      %l0, [%l3 + 24]
>> Current function is opal_event_base_close
>>   62       opal_event_base_free (opal_event_base);
>> 
>> (dbx) check -all
>> dbx: warning: check -all will be turned on in the next run of the process
>> access checking - OFF
>> memuse checking - OFF
>> 
>> (dbx) run -np 3 --host tyr,sunpc1,linpc1 hostname
>> Running: mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname 
>> (process id 23337)
>> Reading rtcapihook.so
>> Reading libdl.so.1
>> Reading rtcaudit.so
>> Reading libmapmalloc.so.1
>> Reading rtcboot.so
>> Reading librtc.so
>> Reading libmd_psr.so.1
>> RTC: Enabling Error Checking...
>> RTC: Using UltraSparc trap mechanism
>> RTC: See `help rtc showmap' and `help rtc limitations' for details.
>> RTC: Running program...
>> Write to unallocated (wua) on thread 1:
>> Attempting to write 1 byte at address 0xffffffff79f04000
>> t@1 (l@1) stopped in _readdir at 0xffffffff56574da0
>> 0xffffffff56574da0: _readdir+0x0064:    call     
>> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff56742a80
>> Current function is find_dyn_components
>>  393                       if (0 != lt_dlforeachfile(dir, save_filename, 
>> NULL)) {
>> (dbx) 
>> 
>> 
>> 
>> Do you need anything else?
>> 
>> 
>> KInd regards
>> 
>> Siegmar
>> 
>> 
>> 
>> 
>> On May 14, 2014, at 11:44 AM, Siegmar Gross 
>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>> 
>>>> Hi Ralph,
>>>> 
>>>>> Hmmm...well, that's an interesting naming scheme :-)
>>>>> 
>>>>> Try adding "-mca oob_base_verbose 10 --report-uri -" on your cmd line
>>>>> and let's see what it thinks is happening
>>>> 
>>>> 
>>>> tyr fd1026 105 mpiexec -np 3 --host tyr,sunpc1,linpc1 --mca 
>>>> oob_base_verbose 10 --report-uri - hostname
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: 
>>>> registering oob components
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: found 
>>>> loaded component tcp
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_register: 
>>>> component tcp register function successful
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: opening oob 
>>>> components
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: found 
>>>> loaded component tcp
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: components_open: component 
>>>> tcp open function successful
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: checking available 
>>>> component tcp
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Querying component [tcp]
>>>> [tyr.informatik.hs-fulda.de:06877] oob:tcp: component_available called
>>>> [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 1 KERNEL INDEX 1 
>>>> FAMILY: V4
>>>> [tyr.informatik.hs-fulda.de:06877] WORKING INTERFACE 2 KERNEL INDEX 2 
>>>> FAMILY: V4
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init creating 
>>>> module for V4 address on interface bge0
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] creating OOB-TCP module 
>>>> for interface bge0
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:tcp:init adding 
>>>> 193.174.24.39 to our list of V4 connections
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP STARTUP
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] attempting to bind to 
>>>> IPv4 port 0
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] assigned IPv4 port 55567
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Adding component to end
>>>> [tyr.informatik.hs-fulda.de:06877] mca:oob:select: Found 1 active 
>>>> transports
>>>> 3170566144.0;tcp://193.174.24.39:55567
>>>> [sunpc1:07690] mca: base: components_register: registering oob components
>>>> [sunpc1:07690] mca: base: components_register: found loaded component tcp
>>>> [sunpc1:07690] mca: base: components_register: component tcp register 
>>>> function successful
>>>> [sunpc1:07690] mca: base: components_open: opening oob components
>>>> [sunpc1:07690] mca: base: components_open: found loaded component tcp
>>>> [sunpc1:07690] mca: base: components_open: component tcp open function 
>>>> successful
>>>> [sunpc1:07690] mca:oob:select: checking available component tcp
>>>> [sunpc1:07690] mca:oob:select: Querying component [tcp]
>>>> [sunpc1:07690] oob:tcp: component_available called
>>>> [sunpc1:07690] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [sunpc1:07690] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:init creating module for V4 address 
>>>> on interface nge0
>>>> [sunpc1:07690] [[48379,0],1] creating OOB-TCP module for interface nge0
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:init adding 193.174.26.210 to our 
>>>> list of V4 connections
>>>> [sunpc1:07690] [[48379,0],1] TCP STARTUP
>>>> [sunpc1:07690] [[48379,0],1] attempting to bind to IPv4 port 0
>>>> [sunpc1:07690] [[48379,0],1] assigned IPv4 port 39616
>>>> [sunpc1:07690] mca:oob:select: Adding component to end
>>>> [sunpc1:07690] mca:oob:select: Found 1 active transports
>>>> [sunpc1:07690] [[48379,0],1]: set_addr to uri 
>>>> 3170566144.0;tcp://193.174.24.39:55567
>>>> [sunpc1:07690] [[48379,0],1]:set_addr checking if peer [[48379,0],0] is 
>>>> reachable via component tcp
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp: working peer [[48379,0],0] address 
>>>> tcp://193.174.24.39:55567
>>>> [sunpc1:07690] [[48379,0],1] UNFOUND KERNEL INDEX -13 FOR ADDRESS 
>>>> 193.174.24.39
>>>> [sunpc1:07690] [[48379,0],1] PEER [[48379,0],0] MAY BE REACHABLE BY 
>>>> ROUTING - ASSIGNING MODULE AT KINDEX 2 
>> INTERFACE nge0
>>>> [sunpc1:07690] [[48379,0],1] PASSING ADDR 193.174.24.39 TO INTERFACE nge0 
>>>> AT KERNEL INDEX 2
>>>> [sunpc1:07690] [[48379,0],1]:tcp set addr for peer [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1]: peer [[48379,0],0] is reachable via 
>>>> component tcp
>>>> [sunpc1:07690] [[48379,0],1] OOB_SEND: 
>>>> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [sunpc1:07690] [[48379,0],1]:tcp:processing set_peer cmd for interface nge0
>>>> [sunpc1:07690] [[48379,0],1] oob:base:send to target [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:send_nb to peer [[48379,0],0]:10
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_nb to peer [[48379,0],0]
>>>> [sunpc1:07690] 
>>>> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508]
>>>>  post send to 
>> [[48379,0],0]
>>>> [sunpc1:07690] 
>>>> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442]
>>>>  processing 
>> send to peer 
>>>> [[48379,0],0]:10
>>>> [sunpc1:07690] 
>>>> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476]
>>>>  queue pending 
>> to [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_nb: initiating connection to 
>>>> [[48379,0],0]
>>>> [sunpc1:07690] 
>>>> [[48379,0],1]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490]
>>>>  connect to 
>> [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to 
>>>> connect to proc [[48379,0],0] via interface 
>> nge0
>>>> [sunpc1:07690] [[48379,0],1] oob:tcp:peer creating socket to [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to 
>>>> connect to proc [[48379,0],0] via interface 
>> nge0 on socket 10
>>>> [sunpc1:07690] [[48379,0],1] orte_tcp_peer_try_connect: attempting to 
>>>> connect to proc [[48379,0],0] on 
>> 193.174.24.39:55567 - 0 retries
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] 
>>>> mca_oob_tcp_listen_thread: new connection: (15, 0) 
>> 193.174.26.210:39617
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: 
>>>> working connection (15, 11) 
>> 193.174.26.210:39617
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON 
>>>> UNKNOWN INTERFACE
>>>> [sunpc1:07690] [[48379,0],1] waiting for connect completion to 
>>>> [[48379,0],0] - activating send event
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_handler called to send to peer 
>>>> [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] tcp:send_handler CONNECTING
>>>> [sunpc1:07690] [[48379,0],1]:tcp:complete_connect called for peer 
>>>> [[48379,0],0] on socket 10
>>>> [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: sending ack to 
>>>> [[48379,0],0]
>>>> [sunpc1:07690] [[48379,0],1] SEND CONNECT ACK
>>>> [sunpc1:07690] [[48379,0],1] send blocking of 48 bytes to socket 10
>>>> [sunpc1:07690] [[48379,0],1] connect-ack sent to socket 10
>>>> [sunpc1:07690] [[48379,0],1] tcp_peer_complete_connect: setting read event 
>>>> on connection to [[48379,0],0]
>>>> [linpc1:21511] mca: base: components_register: registering oob components
>>>> [linpc1:21511] mca: base: components_register: found loaded component tcp
>>>> [linpc1:21511] mca: base: components_register: component tcp register 
>>>> function successful
>>>> [linpc1:21511] mca: base: components_open: opening oob components
>>>> [linpc1:21511] mca: base: components_open: found loaded component tcp
>>>> [linpc1:21511] mca: base: components_open: component tcp open function 
>>>> successful
>>>> [linpc1:21511] mca:oob:select: checking available component tcp
>>>> [linpc1:21511] mca:oob:select: Querying component [tcp]
>>>> [linpc1:21511] oob:tcp: component_available called
>>>> 
>>>> [linpc1:21511] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [linpc1:21511] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:init creating module for V4 address 
>>>> on interface eth0
>>>> [linpc1:21511] [[48379,0],2] creating OOB-TCP module for interface eth0
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:init adding 193.174.26.208 to our 
>>>> list of V4 connections
>>>> [linpc1:21511] [[48379,0],2] TCP STARTUP
>>>> [linpc1:21511] [[48379,0],2] attempting to bind to IPv4 port 0
>>>> [linpc1:21511] [[48379,0],2] assigned IPv4 port 39724
>>>> [linpc1:21511] mca:oob:select: Adding component to end
>>>> [linpc1:21511] mca:oob:select: Found 1 active transports
>>>> [linpc1:21511] [[48379,0],2]: set_addr to uri 
>>>> 3170566144.0;tcp://193.174.24.39:55567
>>>> [linpc1:21511] [[48379,0],2]:set_addr checking if peer [[48379,0],0] is 
>>>> reachable via component tcp
>>>> [linpc1:21511] [[48379,0],2] oob:tcp: working peer [[48379,0],0] address 
>>>> tcp://193.174.24.39:55567
>>>> [linpc1:21511] [[48379,0],2] UNFOUND KERNEL INDEX -13 FOR ADDRESS 
>>>> 193.174.24.39
>>>> [linpc1:21511] [[48379,0],2] PEER [[48379,0],0] MAY BE REACHABLE BY 
>>>> ROUTING - ASSIGNING MODULE AT KINDEX 2 
>> INTERFACE eth0
>>>> [linpc1:21511] [[48379,0],2] PASSING ADDR 193.174.24.39 TO INTERFACE eth0 
>>>> AT KERNEL INDEX 2
>>>> [linpc1:21511] [[48379,0],2]:tcp set addr for peer [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2]: peer [[48379,0],0] is reachable via 
>>>> component tcp
>>>> [linpc1:21511] [[48379,0],2] OOB_SEND: 
>>>> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [linpc1:21511] [[48379,0],2]:tcp:processing set_peer cmd for interface eth0
>>>> [linpc1:21511] [[48379,0],2] oob:base:send to target [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:send_nb to peer [[48379,0],0]:10
>>>> [linpc1:21511] [[48379,0],2] tcp:send_nb to peer [[48379,0],0]
>>>> [linpc1:21511] 
>>>> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:508]
>>>>  post send to 
>> [[48379,0],0]
>>>> [linpc1:21511] 
>>>> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:442]
>>>>  processing 
>> send to peer 
>>>> [[48379,0],0]:10
>>>> [linpc1:21511] 
>>>> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:476]
>>>>  queue pending 
>> to [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] tcp:send_nb: initiating connection to 
>>>> [[48379,0],0]
>>>> [linpc1:21511] 
>>>> [[48379,0],2]:[../../../../../openmpi-1.8.2a1r31742/orte/mca/oob/tcp/oob_tcp.c:490]
>>>>  connect to 
>> [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to 
>>>> connect to proc [[48379,0],0] via interface 
>> eth0
>>>> [linpc1:21511] [[48379,0],2] oob:tcp:peer creating socket to [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to 
>>>> connect to proc [[48379,0],0] via interface 
>> eth0 on socket 9
>>>> [linpc1:21511] [[48379,0],2] orte_tcp_peer_try_connect: attempting to 
>>>> connect to proc [[48379,0],0] on 
>> 193.174.24.39:55567 - 0 retries
>>>> [linpc1:21511] [[48379,0],2] waiting for connect completion to 
>>>> [[48379,0],0] - activating send event
>>>> [linpc1:21511] [[48379,0],2] tcp:send_handler called to send to peer 
>>>> [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] tcp:send_handler CONNECTING
>>>> [linpc1:21511] [[48379,0],2]:tcp:complete_connect called for peer 
>>>> [[48379,0],0] on socket 9
>>>> [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: sending ack to 
>>>> [[48379,0],0]
>>>> [linpc1:21511] [[48379,0],2] SEND CONNECT ACK
>>>> [linpc1:21511] [[48379,0],2] send blocking of 48 bytes to socket 9
>>>> [linpc1:21511] [[48379,0],2] connect-ack sent to socket 9
>>>> [linpc1:21511] [[48379,0],2] tcp_peer_complete_connect: setting read event 
>>>> on connection to [[48379,0],0]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] 
>>>> mca_oob_tcp_listen_thread: new connection: (16, 11) 
>> 193.174.26.208:53741
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] connection_handler: 
>>>> working connection (16, 11) 
>> 193.174.26.208:53741
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] CONNECTION REQUEST ON 
>>>> UNKNOWN INTERFACE
>>>> ^CKilled by signal 2.
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: 
>> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] OOB_SEND: 
>> ../../../../../openmpi-1.8.2a1r31742/orte/mca/rml/oob/rml_oob_send.c:199
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target 
>>>> [[48379,0],1]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown 
>>>> peer [[48379,0],1]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send to target 
>>>> [[48379,0],2]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] oob:base:send unknown 
>>>> peer [[48379,0],2]
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] is NOT reachable by TCP
>>>> Killed by signal 2.
>>>> [tyr.informatik.hs-fulda.de:06877] [[48379,0],0] TCP SHUTDOWN
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: close: component tcp closed
>>>> [tyr.informatik.hs-fulda.de:06877] mca: base: close: unloading component 
>>>> tcp
>>>> tyr fd1026 106 
>>>> 
>>>> 
>>>> Thank you very much for your help in advance. Do you need anything else?
>>>> 
>>>> 
>>>> Kind regards
>>>> 
>>>> Siegmar
>>>> 
>>>> 
>>>> 
>>>>> On May 14, 2014, at 9:06 AM, Siegmar Gross 
>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>> 
>>>>>> Hi Ralph,
>>>>>> 
>>>>>>> What are the interfaces on these machines?
>>>>>> 
>>>>>> tyr fd1026 111 ifconfig -a
>>>>>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 
>>>>>> 8232 index 1
>>>>>>      inet 127.0.0.1 netmask ff000000 
>>>>>> bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>>>>>>      inet 193.174.24.39 netmask ffffffe0 broadcast 193.174.24.63
>>>>>> tyr fd1026 112 
>>>>>> 
>>>>>> 
>>>>>> tyr fd1026 112 ssh sunpc1 ifconfig -a
>>>>>> lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 
>>>>>> 8232 index 1
>>>>>>      inet 127.0.0.1 netmask ff000000 
>>>>>> nge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
>>>>>>      inet 193.174.26.210 netmask ffffffc0 broadcast 193.174.26.255
>>>>>> tyr fd1026 113 
>>>>>> 
>>>>>> 
>>>>>> tyr fd1026 113 ssh linpc1 /sbin/ifconfig -a
>>>>>> eth0      Link encap:Ethernet  HWaddr 00:14:4F:23:FD:A8  
>>>>>>        inet addr:193.174.26.208  Bcast:193.174.26.255  
>>>>>> Mask:255.255.255.192
>>>>>>        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>>>        RX packets:18052524 errors:127 dropped:0 overruns:0 frame:127
>>>>>>        TX packets:15917888 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:1000 
>>>>>>        RX bytes:4158294157 (3965.6 Mb)  TX bytes:12060556809 (11501.8 Mb)
>>>>>>        Interrupt:23 Base address:0x4000 
>>>>>> 
>>>>>> eth1      Link encap:Ethernet  HWaddr 00:14:4F:23:FD:A9  
>>>>>>        BROADCAST MULTICAST  MTU:1500  Metric:1
>>>>>>        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:1000 
>>>>>>        RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>>>>>>        Interrupt:45 Base address:0xa000 
>>>>>> 
>>>>>> lo        Link encap:Local Loopback  
>>>>>>        inet addr:127.0.0.1  Mask:255.0.0.0
>>>>>>        UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>>>>        RX packets:1083 errors:0 dropped:0 overruns:0 frame:0
>>>>>>        TX packets:1083 errors:0 dropped:0 overruns:0 carrier:0
>>>>>>        collisions:0 txqueuelen:0 
>>>>>>        RX bytes:329323 (321.6 Kb)  TX bytes:329323 (321.6 Kb)
>>>>>> 
>>>>>> tyr fd1026 114 
>>>>>> 
>>>>>> 
>>>>>> Do you need something else?
>>>>>> 
>>>>>> 
>>>>>> Kind regards
>>>>>> 
>>>>>> Siegmar
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On May 14, 2014, at 7:45 AM, Siegmar Gross 
>>>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I just installed openmpi-1.8.2a1r31742 on my machines (Solaris 10
>>>>>>>> Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
>>>>>>>> Sun C5.12 and still have the following problem.
>>>>>>>> 
>>>>>>>> tyr fd1026 102 which mpiexec
>>>>>>>> /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec
>>>>>>>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>>>>>>>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
>>>>>>>> REQUEST ON UNKNOWN INTERFACE
>>>>>>>> [tyr.informatik.hs-fulda.de:12827] [[37949,0],0] CONNECTION
>>>>>>>> REQUEST ON UNKNOWN INTERFACE
>>>>>>>> ^CKilled by signal 2.
>>>>>>>> Killed by signal 2.
>>>>>>>> tyr fd1026 104 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The command works fine with openmpi-1.6.6rc1.
>>>>>>>> 
>>>>>>>> tyr fd1026 102 which mpiexec
>>>>>>>> /usr/local/openmpi-1.6.6_64_cc/bin/mpiexec
>>>>>>>> tyr fd1026 103 mpiexec -np 3 --host tyr,sunpc1,linpc1 hostname
>>>>>>>> tyr.informatik.hs-fulda.de
>>>>>>>> linpc1
>>>>>>>> sunpc1
>>>>>>>> tyr fd1026 104 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have reported the problem before and I would be grateful, if
>>>>>>>> somebody could solve it. Please let me know if I can provide any
>>>>>>>> other information.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kind regards
>>>>>>>> 
>>>>>>>> Siegmar
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to