Ok, Thanks. In that case, can we reopen this issue then, to get an update from the participants.?
Cordially, Muku. On Thu, Oct 19, 2017 at 4:37 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: > Actually, I don’t see any related changes in OMPI master, let alone the > branches. So far as I can tell, the author never actually submitted the > work. > > > On Oct 19, 2017, at 3:57 PM, Mukkie <mukunthh...@gmail.com> wrote: > > FWIW, my issue is related to this one. > https://github.com/open-mpi/ompi/issues/1585 > > I have version 3.0.0 and the above issue is closed saying, fixes went into > 3.1.0 > However, i don't see the code changes towards this issue.? > > Cordially, > Muku. > > On Wed, Oct 18, 2017 at 3:52 PM, Mukkie <mukunthh...@gmail.com> wrote: > >> Thanks for your suggestion. However my firewall's are already disabled on >> both the machines. >> >> Cordially, >> Muku. >> >> On Wed, Oct 18, 2017 at 2:38 PM, r...@open-mpi.org <r...@open-mpi.org> >> wrote: >> >>> Looks like there is a firewall or something blocking communication >>> between those nodes? >>> >>> On Oct 18, 2017, at 1:29 PM, Mukkie <mukunthh...@gmail.com> wrote: >>> >>> Adding a verbose output. Please check for failed and advise. Thank you. >>> >>> [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca >>> oob_base_verbose 100 --mca btl tcp,self ring_c >>> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open >>> mca_plm_tm: libtorque.so.2: cannot open shared object file: No such file or >>> directory (ignored) >>> [ipv-rhel73:10575] mca: base: components_register: registering framework >>> oob components >>> [ipv-rhel73:10575] mca: base: components_register: found loaded >>> component tcp >>> [ipv-rhel73:10575] mca: base: components_register: component tcp >>> register function successful >>> [ipv-rhel73:10575] mca: base: components_open: opening oob components >>> [ipv-rhel73:10575] mca: base: components_open: found loaded component tcp >>> [ipv-rhel73:10575] mca: base: components_open: component tcp open >>> function successful >>> [ipv-rhel73:10575] mca:oob:select: checking available component tcp >>> [ipv-rhel73:10575] mca:oob:select: Querying component [tcp] >>> [ipv-rhel73:10575] oob:tcp: component_available called >>> [ipv-rhel73:10575] WORKING INTERFACE 1 KERNEL INDEX 2 FAMILY: V6 >>> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init adding >>> fe80::b9b:ac5d:9cf0:b858 to our list of V6 connections >>> [ipv-rhel73:10575] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4 >>> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init rejecting loopback >>> interface lo >>> [ipv-rhel73:10575] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 >>> [ipv-rhel73:10575] [[20058,0],0] TCP STARTUP >>> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv4 port 0 >>> [ipv-rhel73:10575] [[20058,0],0] assigned IPv4 port 53438 >>> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv6 port 0 >>> [ipv-rhel73:10575] [[20058,0],0] assigned IPv6 port 43370 >>> [ipv-rhel73:10575] mca:oob:select: Adding component to end >>> [ipv-rhel73:10575] mca:oob:select: Found 1 active transports >>> [ipv-rhel73:10575] [[20058,0],0]: get transports >>> [ipv-rhel73:10575] [[20058,0],0]:get transports for component tcp >>> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open >>> mca_ras_tm: libtorque.so.2: cannot open shared object file: No such file or >>> directory (ignored) >>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: >>> registering framework oob components >>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: >>> found loaded component tcp >>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: >>> component tcp register function successful >>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: opening >>> oob components >>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: found >>> loaded component tcp >>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: >>> component tcp open function successful >>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: checking available >>> component tcp >>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Querying component >>> [tcp] >>> [ipv-rhel71a.locallab.local:12299] oob:tcp: component_available called >>> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 1 KERNEL INDEX 2 >>> FAMILY: V6 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init adding >>> fe80::226:b9ff:fe85:6a28 to our list of V6 connections >>> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 2 KERNEL INDEX 1 >>> FAMILY: V4 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init rejecting >>> loopback interface lo >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] TCP STARTUP >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to >>> IPv4 port 0 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv4 port >>> 50782 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to >>> IPv6 port 0 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv6 port >>> 59268 >>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Adding component to >>> end >>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Found 1 active >>> transports >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: get transports >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:get transports for >>> component tcp >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: set_addr to uri >>> 1314521088.0;tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:set_addr checking if >>> peer [[20058,0],0] is reachable via component tcp >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp: working peer >>> [[20058,0],0] address tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SET_PEER ADDING PEER >>> [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] set_peer: peer >>> [[20058,0],0] is listening on net fe80::b9b:ac5d:9cf0:b858 port 43370 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: peer [[20058,0],0] is >>> reachable via component tcp >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] OOB_SEND: >>> rml_oob_send.c:265 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:base:send to >>> target [[20058,0],0] - attempt 0 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:send_nb to >>> peer [[20058,0],0]:10 seq = -1 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:204] >>> processing send to peer [[20058,0],0]:10 seq_num = -1 via [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:225] queue >>> pending to [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:send_nb: >>> initiating connection to [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:239] >>> connect to [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on >>> socket 20 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on >>> (null):-1 - 0 retries >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: Connection to proc [[20058,0],0] succeeded >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 >>> bytes to socket 20 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> tcp_peer_send_blocking: send() to socket 20 failed: Broken pipe (32) >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_close for >>> [[20058,0],0] sd 20 state FAILED >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp_connection.c:356] >>> connect to [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:lost connection >>> called for peer [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on >>> socket 20 >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on >>> (null):-1 - 0 retries >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] >>> orte_tcp_peer_try_connect: Connection to proc [[20058,0],0] succeeded >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK >>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 >>> bytes to socket 20 >>> ------------------------------------------------------------ >>> -------------- >>> ORTE was unable to reliably start one or more daemons. >>> This usually is caused by: >>> >>> * not finding the required libraries and/or binaries on >>> one or more nodes. Please check your PATH and LD_LIBRARY_PATH >>> settings, or configure OMPI with --enable-orterun-prefix-by-default >>> >>> * lack of authority to execute on one or more specified nodes. >>> Please verify your allocation and authorities. >>> >>> * the inability to write startup files into /tmp >>> (--tmpdir/orte_tmpdir_base). >>> Please check with your sys admin to determine the correct location to >>> use. >>> >>> * compilation of the orted with dynamic libraries when static are >>> required >>> (e.g., on Cray). Please check your configure cmd line and consider >>> using >>> one of the contrib/platform definitions for your system type. >>> >>> * an inability to create a connection back to mpirun due to a >>> lack of common network interfaces and/or no route found between >>> them. Please check network connectivity (including firewalls >>> and network routing requirements). >>> ------------------------------------------------------------ >>> -------------- >>> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN >>> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN done >>> [ipv-rhel73:10575] mca: base: close: component tcp closed >>> [ipv-rhel73:10575] mca: base: close: unloading component tcp >>> >>> Cordially, >>> Muku. >>> >>> >>> On Wed, Oct 18, 2017 at 11:18 AM, Mukkie <mukunthh...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I have two ipv6 only machines, I configured/built OMPI version 3.0 with >>>> - -enable-ipv6 >>>> >>>> I want to verify a simple MPI communication call through tcp ip between >>>> these two machines. I am using ring_c and connectivity_c examples. >>>> >>>> >>>> Issuing from one of the host machine… >>>> >>>> [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca btl >>>> tcp,self --mca oob_base_verbose 100 ring_c >>>> >>>> . >>>> . >>>> >>>> [ipv-rhel71a.locallab.local:10822] [[5331,0],1] >>>> tcp_peer_send_blocking: send() to socket 20 failed: Broken pipe (32) >>>> >>>> >>>> where “host” contains the ipv6 address of the remote machine (namely – >>>> ‘ipv-rhel71a’). Also I have passwordless ssh setup to the remote machine. >>>> >>>> >>>> I will attach a verbose output in the follow-up post. >>>> >>>> Thanks. >>>> >>>> >>>> Cordially, >>>> >>>> >>>> >>>> *Mukundhan Selvam* >>>> >>>> Development Engineer, HPC >>>> >>>> [image: MSC Software] <http://www.mscsoftware.com/> >>>> >>>> 4675 MacArthur Court, Newport Beach, CA 92660 >>>> >>>> 714-540-8900 <(714)%20540-8900> ext. 4166 >>>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >>> >> >> > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users