Ok, Thanks. In that case, can we reopen this issue then, to get an update
from the participants.?


Cordially,
Muku.

On Thu, Oct 19, 2017 at 4:37 PM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> Actually, I don’t see any related changes in OMPI master, let alone the
> branches. So far as I can tell, the author never actually submitted the
> work.
>
>
> On Oct 19, 2017, at 3:57 PM, Mukkie <mukunthh...@gmail.com> wrote:
>
> FWIW, my issue is related to this one.
> https://github.com/open-mpi/ompi/issues/1585
>
> I have version 3.0.0 and the above issue is closed saying, fixes went into
> 3.1.0
> However, i don't see the code changes towards this issue.?
>
> Cordially,
> Muku.
>
> On Wed, Oct 18, 2017 at 3:52 PM, Mukkie <mukunthh...@gmail.com> wrote:
>
>> Thanks for your suggestion. However my firewall's are already disabled on
>> both the machines.
>>
>> Cordially,
>> Muku.
>>
>> On Wed, Oct 18, 2017 at 2:38 PM, r...@open-mpi.org <r...@open-mpi.org>
>> wrote:
>>
>>> Looks like there is a firewall or something blocking communication
>>> between those nodes?
>>>
>>> On Oct 18, 2017, at 1:29 PM, Mukkie <mukunthh...@gmail.com> wrote:
>>>
>>> Adding a verbose output. Please check for failed and advise. Thank you.
>>>
>>> [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca
>>> oob_base_verbose 100 --mca btl tcp,self ring_c
>>> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open
>>> mca_plm_tm: libtorque.so.2: cannot open shared object file: No such file or
>>> directory (ignored)
>>> [ipv-rhel73:10575] mca: base: components_register: registering framework
>>> oob components
>>> [ipv-rhel73:10575] mca: base: components_register: found loaded
>>> component tcp
>>> [ipv-rhel73:10575] mca: base: components_register: component tcp
>>> register function successful
>>> [ipv-rhel73:10575] mca: base: components_open: opening oob components
>>> [ipv-rhel73:10575] mca: base: components_open: found loaded component tcp
>>> [ipv-rhel73:10575] mca: base: components_open: component tcp open
>>> function successful
>>> [ipv-rhel73:10575] mca:oob:select: checking available component tcp
>>> [ipv-rhel73:10575] mca:oob:select: Querying component [tcp]
>>> [ipv-rhel73:10575] oob:tcp: component_available called
>>> [ipv-rhel73:10575] WORKING INTERFACE 1 KERNEL INDEX 2 FAMILY: V6
>>> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init adding
>>> fe80::b9b:ac5d:9cf0:b858 to our list of V6 connections
>>> [ipv-rhel73:10575] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4
>>> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init rejecting loopback
>>> interface lo
>>> [ipv-rhel73:10575] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>> [ipv-rhel73:10575] [[20058,0],0] TCP STARTUP
>>> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv4 port 0
>>> [ipv-rhel73:10575] [[20058,0],0] assigned IPv4 port 53438
>>> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv6 port 0
>>> [ipv-rhel73:10575] [[20058,0],0] assigned IPv6 port 43370
>>> [ipv-rhel73:10575] mca:oob:select: Adding component to end
>>> [ipv-rhel73:10575] mca:oob:select: Found 1 active transports
>>> [ipv-rhel73:10575] [[20058,0],0]: get transports
>>> [ipv-rhel73:10575] [[20058,0],0]:get transports for component tcp
>>> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open
>>> mca_ras_tm: libtorque.so.2: cannot open shared object file: No such file or
>>> directory (ignored)
>>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register:
>>> registering framework oob components
>>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register:
>>> found loaded component tcp
>>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register:
>>> component tcp register function successful
>>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: opening
>>> oob components
>>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: found
>>> loaded component tcp
>>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open:
>>> component tcp open function successful
>>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: checking available
>>> component tcp
>>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Querying component
>>> [tcp]
>>> [ipv-rhel71a.locallab.local:12299] oob:tcp: component_available called
>>> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 1 KERNEL INDEX 2
>>> FAMILY: V6
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init adding
>>> fe80::226:b9ff:fe85:6a28 to our list of V6 connections
>>> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 2 KERNEL INDEX 1
>>> FAMILY: V4
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init rejecting
>>> loopback interface lo
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] TCP STARTUP
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to
>>> IPv4 port 0
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv4 port
>>> 50782
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to
>>> IPv6 port 0
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv6 port
>>> 59268
>>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Adding component to
>>> end
>>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Found 1 active
>>> transports
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: get transports
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:get transports for
>>> component tcp
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: set_addr to uri
>>> 1314521088.0;tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:set_addr checking if
>>> peer [[20058,0],0] is reachable via component tcp
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp: working peer
>>> [[20058,0],0] address tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SET_PEER ADDING PEER
>>> [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] set_peer: peer
>>> [[20058,0],0] is listening on net fe80::b9b:ac5d:9cf0:b858 port 43370
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: peer [[20058,0],0] is
>>> reachable via component tcp
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] OOB_SEND:
>>> rml_oob_send.c:265
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:base:send to
>>> target [[20058,0],0] - attempt 0
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:send_nb to
>>> peer [[20058,0],0]:10 seq = -1
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:204]
>>> processing send to peer [[20058,0],0]:10 seq_num = -1 via [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:225] queue
>>> pending to [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:send_nb:
>>> initiating connection to [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:239]
>>> connect to [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on
>>> socket 20
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on
>>> (null):-1 - 0 retries
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: Connection to proc [[20058,0],0] succeeded
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72
>>> bytes to socket 20
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> tcp_peer_send_blocking: send() to socket 20 failed: Broken pipe (32)
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_close for
>>> [[20058,0],0] sd 20 state FAILED
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp_connection.c:356]
>>> connect to [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:lost connection
>>> called for peer [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0]
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on
>>> socket 20
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on
>>> (null):-1 - 0 retries
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]
>>> orte_tcp_peer_try_connect: Connection to proc [[20058,0],0] succeeded
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK
>>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72
>>> bytes to socket 20
>>> ------------------------------------------------------------
>>> --------------
>>> ORTE was unable to reliably start one or more daemons.
>>> This usually is caused by:
>>>
>>> * not finding the required libraries and/or binaries on
>>>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>>>
>>> * lack of authority to execute on one or more specified nodes.
>>>   Please verify your allocation and authorities.
>>>
>>> * the inability to write startup files into /tmp
>>> (--tmpdir/orte_tmpdir_base).
>>>   Please check with your sys admin to determine the correct location to
>>> use.
>>>
>>> *  compilation of the orted with dynamic libraries when static are
>>> required
>>>   (e.g., on Cray). Please check your configure cmd line and consider
>>> using
>>>   one of the contrib/platform definitions for your system type.
>>>
>>> * an inability to create a connection back to mpirun due to a
>>>   lack of common network interfaces and/or no route found between
>>>   them. Please check network connectivity (including firewalls
>>>   and network routing requirements).
>>> ------------------------------------------------------------
>>> --------------
>>> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN
>>> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN done
>>> [ipv-rhel73:10575] mca: base: close: component tcp closed
>>> [ipv-rhel73:10575] mca: base: close: unloading component tcp
>>>
>>> Cordially,
>>> Muku.
>>>
>>>
>>> On Wed, Oct 18, 2017 at 11:18 AM, Mukkie <mukunthh...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have two ipv6 only machines, I configured/built OMPI version 3.0 with
>>>> - -enable-ipv6
>>>>
>>>> I want to verify a simple MPI communication call through tcp ip between
>>>> these two machines. I am using ring_c and connectivity_c examples.
>>>>
>>>>
>>>> Issuing from one of the host machine…
>>>>
>>>> [mselvam@ipv-rhel73 examples]$  mpirun -hostfile host --mca btl
>>>> tcp,self --mca oob_base_verbose 100 ring_c
>>>>
>>>> .
>>>> .
>>>>
>>>> [ipv-rhel71a.locallab.local:10822] [[5331,0],1]
>>>> tcp_peer_send_blocking: send() to socket 20 failed: Broken pipe (32)
>>>>
>>>>
>>>> where “host” contains the ipv6 address of the remote machine (namely –
>>>> ‘ipv-rhel71a’). Also I have passwordless ssh setup to the remote machine.
>>>>
>>>>
>>>> I will attach a verbose output in the follow-up post.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Cordially,
>>>>
>>>>
>>>>
>>>> *Mukundhan Selvam*
>>>>
>>>> Development Engineer, HPC
>>>>
>>>> [image: MSC Software] <http://www.mscsoftware.com/>
>>>>
>>>> 4675 MacArthur Court, Newport Beach, CA 92660
>>>>
>>>> 714-540-8900 <(714)%20540-8900> ext. 4166
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>
>>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to