Actually, I don’t see any related changes in OMPI master, let alone the branches. So far as I can tell, the author never actually submitted the work.
> On Oct 19, 2017, at 3:57 PM, Mukkie <mukunthh...@gmail.com> wrote: > > FWIW, my issue is related to this one. > https://github.com/open-mpi/ompi/issues/1585 > <https://github.com/open-mpi/ompi/issues/1585> > > I have version 3.0.0 and the above issue is closed saying, fixes went into > 3.1.0 > However, i don't see the code changes towards this issue.? > > Cordially, > Muku. > > On Wed, Oct 18, 2017 at 3:52 PM, Mukkie <mukunthh...@gmail.com > <mailto:mukunthh...@gmail.com>> wrote: > Thanks for your suggestion. However my firewall's are already disabled on > both the machines. > > Cordially, > Muku. > > On Wed, Oct 18, 2017 at 2:38 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> > <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: > Looks like there is a firewall or something blocking communication between > those nodes? > >> On Oct 18, 2017, at 1:29 PM, Mukkie <mukunthh...@gmail.com >> <mailto:mukunthh...@gmail.com>> wrote: >> >> Adding a verbose output. Please check for failed and advise. Thank you. >> >> [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca oob_base_verbose >> 100 --mca btl tcp,self ring_c >> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open >> mca_plm_tm: libtorque.so.2: cannot open shared object file: No such file or >> directory (ignored) >> [ipv-rhel73:10575] mca: base: components_register: registering framework oob >> components >> [ipv-rhel73:10575] mca: base: components_register: found loaded component tcp >> [ipv-rhel73:10575] mca: base: components_register: component tcp register >> function successful >> [ipv-rhel73:10575] mca: base: components_open: opening oob components >> [ipv-rhel73:10575] mca: base: components_open: found loaded component tcp >> [ipv-rhel73:10575] mca: base: components_open: component tcp open function >> successful >> [ipv-rhel73:10575] mca:oob:select: checking available component tcp >> [ipv-rhel73:10575] mca:oob:select: Querying component [tcp] >> [ipv-rhel73:10575] oob:tcp: component_available called >> [ipv-rhel73:10575] WORKING INTERFACE 1 KERNEL INDEX 2 FAMILY: V6 >> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init adding >> fe80::b9b:ac5d:9cf0:b858 to our list of V6 connections >> [ipv-rhel73:10575] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4 >> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init rejecting loopback interface lo >> [ipv-rhel73:10575] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 >> [ipv-rhel73:10575] [[20058,0],0] TCP STARTUP >> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv4 port 0 >> [ipv-rhel73:10575] [[20058,0],0] assigned IPv4 port 53438 >> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv6 port 0 >> [ipv-rhel73:10575] [[20058,0],0] assigned IPv6 port 43370 >> [ipv-rhel73:10575] mca:oob:select: Adding component to end >> [ipv-rhel73:10575] mca:oob:select: Found 1 active transports >> [ipv-rhel73:10575] [[20058,0],0]: get transports >> [ipv-rhel73:10575] [[20058,0],0]:get transports for component tcp >> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open >> mca_ras_tm: libtorque.so.2: cannot open shared object file: No such file or >> directory (ignored) >> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: >> registering framework oob components >> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: found >> loaded component tcp >> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: component >> tcp register function successful >> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: opening oob >> components >> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: found loaded >> component tcp >> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: component tcp >> open function successful >> [ipv-rhel71a.locallab.local:12299] mca:oob:select: checking available >> component tcp >> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Querying component [tcp] >> [ipv-rhel71a.locallab.local:12299] oob:tcp: component_available called >> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 1 KERNEL INDEX 2 >> FAMILY: V6 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init adding >> fe80::226:b9ff:fe85:6a28 to our list of V6 connections >> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 2 KERNEL INDEX 1 >> FAMILY: V4 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init rejecting >> loopback interface lo >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] TCP STARTUP >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to IPv4 >> port 0 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv4 port 50782 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to IPv6 >> port 0 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv6 port 59268 >> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Adding component to end >> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Found 1 active transports >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: get transports >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:get transports for >> component tcp >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: set_addr to uri >> 1314521088.0;tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 <> >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:set_addr checking if peer >> [[20058,0],0] is reachable via component tcp >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp: working peer >> [[20058,0],0] address tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 <> >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SET_PEER ADDING PEER >> [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] set_peer: peer >> [[20058,0],0] is listening on net fe80::b9b:ac5d:9cf0:b858 port 43370 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: peer [[20058,0],0] is >> reachable via component tcp >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] OOB_SEND: rml_oob_send.c:265 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:base:send to target >> [[20058,0],0] - attempt 0 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:send_nb to peer >> [[20058,0],0]:10 seq = -1 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:204] processing >> send to peer [[20058,0],0]:10 seq_num = -1 via [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:225] queue >> pending to [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:send_nb: initiating >> connection to [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:239] connect to >> [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> attempting to connect to proc [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> attempting to connect to proc [[20058,0],0] on socket 20 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> attempting to connect to proc [[20058,0],0] on (null):-1 - 0 retries >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> Connection to proc [[20058,0],0] succeeded >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 bytes >> to socket 20 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_send_blocking: >> send() to socket 20 failed: Broken pipe (32) >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_close for >> [[20058,0],0] sd 20 state FAILED >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp_connection.c:356] >> connect to [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:lost connection called >> for peer [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> attempting to connect to proc [[20058,0],0] >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> attempting to connect to proc [[20058,0],0] on socket 20 >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> attempting to connect to proc [[20058,0],0] on (null):-1 - 0 retries >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: >> Connection to proc [[20058,0],0] succeeded >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK >> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 bytes >> to socket 20 >> -------------------------------------------------------------------------- >> ORTE was unable to reliably start one or more daemons. >> This usually is caused by: >> >> * not finding the required libraries and/or binaries on >> one or more nodes. Please check your PATH and LD_LIBRARY_PATH >> settings, or configure OMPI with --enable-orterun-prefix-by-default >> >> * lack of authority to execute on one or more specified nodes. >> Please verify your allocation and authorities. >> >> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). >> Please check with your sys admin to determine the correct location to use. >> >> * compilation of the orted with dynamic libraries when static are required >> (e.g., on Cray). Please check your configure cmd line and consider using >> one of the contrib/platform definitions for your system type. >> >> * an inability to create a connection back to mpirun due to a >> lack of common network interfaces and/or no route found between >> them. Please check network connectivity (including firewalls >> and network routing requirements). >> -------------------------------------------------------------------------- >> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN >> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN done >> [ipv-rhel73:10575] mca: base: close: component tcp closed >> [ipv-rhel73:10575] mca: base: close: unloading component tcp >> >> Cordially, >> Muku. >> >> >> On Wed, Oct 18, 2017 at 11:18 AM, Mukkie <mukunthh...@gmail.com >> <mailto:mukunthh...@gmail.com>> wrote: >> Hi, >> >> I have two ipv6 only machines, I configured/built OMPI version 3.0 with - >> -enable-ipv6 >> >> I want to verify a simple MPI communication call through tcp ip between >> these two machines. I am using ring_c and connectivity_c examples. >> >> >> Issuing from one of the host machine… >> >> [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca btl tcp,self >> --mca oob_base_verbose 100 ring_c >> >> . >> . >> >> [ipv-rhel71a.locallab.local:10822] [[5331,0],1] tcp_peer_send_blocking: >> send() to socket 20 failed: Broken pipe (32) >> >> >> >> where “host” contains the ipv6 address of the remote machine (namely – >> ‘ipv-rhel71a’). Also I have passwordless ssh setup to the remote machine. >> >> >> I will attach a verbose output in the follow-up post. >> >> Thanks. >> >> >> Cordially, >> >> >> >> Mukundhan Selvam >> >> Development Engineer, HPC >> >> <http://www.mscsoftware.com/> >> 4675 MacArthur Court, Newport Beach, CA 92660 >> >> 714-540-8900 <tel:(714)%20540-8900> ext. 4166 >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users >> <https://lists.open-mpi.org/mailman/listinfo/users> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users