Forgive me, but I am now fully confused - case 1 and case 3 appear identical to me, except for the debug-daemons flag on case 3.
On Jul 15, 2014, at 7:56 AM, Ricardo Fernández-Perea <rfernandezpe...@gmail.com> wrote: > What I mean with "another mpi process". > I have 4 nodes where there is process that use mpi and where initiated using > mpirun from the control node already running when I run the command against > any of those nodes it execute but when I do it against any other node it > fails if no_tree_spawn flag is used it works OK > > case 1 it Fails > > /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca oob_base_verbose > 10 -mca plm_base_verbose 10 -host nexus17 ompi_info > > [nexus10.nlroc:31321] mca: base: components_register: registering plm > components > [nexus10.nlroc:31321] mca: base: components_register: found loaded component > isolated > [nexus10.nlroc:31321] mca: base: components_register: component isolated has > no register or open function > [nexus10.nlroc:31321] mca: base: components_register: found loaded component > rsh > [nexus10.nlroc:31321] mca: base: components_register: component rsh register > function successful > [nexus10.nlroc:31321] mca: base: components_register: found loaded component > slurm > [nexus10.nlroc:31321] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:31321] mca: base: components_open: opening plm components > [nexus10.nlroc:31321] mca: base: components_open: found loaded component > isolated > [nexus10.nlroc:31321] mca: base: components_open: component isolated open > function successful > [nexus10.nlroc:31321] mca: base: components_open: found loaded component rsh > [nexus10.nlroc:31321] mca: base: components_open: component rsh open function > successful > [nexus10.nlroc:31321] mca: base: components_open: found loaded component slurm > [nexus10.nlroc:31321] mca: base: components_open: component slurm open > function successful > [nexus10.nlroc:31321] mca:base:select: Auto-selecting plm components > [nexus10.nlroc:31321] mca:base:select:( plm) Querying component [isolated] > [nexus10.nlroc:31321] mca:base:select:( plm) Query of component [isolated] > set priority to 0 > [nexus10.nlroc:31321] mca:base:select:( plm) Querying component [rsh] > [nexus10.nlroc:31321] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus10.nlroc:31321] mca:base:select:( plm) Querying component [slurm] > [nexus10.nlroc:31321] mca:base:select:( plm) Skipping component [slurm]. > Query failed to return a module > [nexus10.nlroc:31321] mca:base:select:( plm) Selected component [rsh] > [nexus10.nlroc:31321] mca: base: close: component isolated closed > [nexus10.nlroc:31321] mca: base: close: unloading component isolated > [nexus10.nlroc:31321] mca: base: close: component slurm closed > [nexus10.nlroc:31321] mca: base: close: unloading component slurm > [nexus10.nlroc:31321] mca: base: components_register: registering oob > components > [nexus10.nlroc:31321] mca: base: components_register: found loaded component > tcp > [nexus10.nlroc:31321] mca: base: components_register: component tcp register > function successful > [nexus10.nlroc:31321] mca: base: components_open: opening oob components > [nexus10.nlroc:31321] mca: base: components_open: found loaded component tcp > [nexus10.nlroc:31321] mca: base: components_open: component tcp open function > successful > [nexus10.nlroc:31321] mca:oob:select: checking available component tcp > [nexus10.nlroc:31321] mca:oob:select: Querying component [tcp] > [nexus10.nlroc:31321] oob:tcp: component_available called > [nexus10.nlroc:31321] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 > [nexus10.nlroc:31321] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 > [nexus10.nlroc:31321] [[56634,0],0] oob:tcp:init creating module for V4 > address on interface en0 > [nexus10.nlroc:31321] [[56634,0],0] oob:tcp:init adding 172.16.1.10 to our > list of V4 connections > [nexus10.nlroc:31321] [[56634,0],0] TCP STARTUP > [nexus10.nlroc:31321] [[56634,0],0] attempting to bind to IPv4 port 0 > [nexus10.nlroc:31321] [[56634,0],0] assigned IPv4 port 50898 > [nexus10.nlroc:31321] mca:oob:select: Adding component to end > [nexus10.nlroc:31321] mca:oob:select: Found 1 active transports > > I Crtl-C here when it hangs > > ^C[nexus10.nlroc:31321] [[56634,0],0] OOB_SEND: rml_oob_send.c:199 > [nexus10.nlroc:31321] [[56634,0],0] oob:base:send to target [[56634,0],1] > [nexus10.nlroc:31321] [[56634,0],0] oob:base:send unknown peer [[56634,0],1] > [nexus10.nlroc:31321] [[56634,0],0] is NOT reachable by TCP > [nexus10.nlroc:31321] mca: base: close: component rsh closed > [nexus10.nlroc:31321] mca: base: close: unloading component rsh > [nexus10.nlroc:31321] [[56634,0],0] TCP SHUTDOWN > [nexus10.nlroc:31321] mca: base: close: component tcp closed > [nexus10.nlroc:31321] mca: base: close: unloading component tcp > > > Case 2 to the same node but without the rsh_no_tree flag > > /opt/openmpi/bin/mpirun -mca oob_base_verbose 10 -mca plm_base_verbose 10 > -host nexus17 ompi_info > [nexus10.nlroc:31369] mca: base: components_register: registering plm > components > [nexus10.nlroc:31369] mca: base: components_register: found loaded component > isolated > [nexus10.nlroc:31369] mca: base: components_register: component isolated has > no register or open function > [nexus10.nlroc:31369] mca: base: components_register: found loaded component > rsh > [nexus10.nlroc:31369] mca: base: components_register: component rsh register > function successful > [nexus10.nlroc:31369] mca: base: components_register: found loaded component > slurm > [nexus10.nlroc:31369] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:31369] mca: base: components_open: opening plm components > [nexus10.nlroc:31369] mca: base: components_open: found loaded component > isolated > [nexus10.nlroc:31369] mca: base: components_open: component isolated open > function successful > [nexus10.nlroc:31369] mca: base: components_open: found loaded component rsh > [nexus10.nlroc:31369] mca: base: components_open: component rsh open function > successful > [nexus10.nlroc:31369] mca: base: components_open: found loaded component slurm > [nexus10.nlroc:31369] mca: base: components_open: component slurm open > function successful > [nexus10.nlroc:31369] mca:base:select: Auto-selecting plm components > [nexus10.nlroc:31369] mca:base:select:( plm) Querying component [isolated] > [nexus10.nlroc:31369] mca:base:select:( plm) Query of component [isolated] > set priority to 0 > [nexus10.nlroc:31369] mca:base:select:( plm) Querying component [rsh] > [nexus10.nlroc:31369] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus10.nlroc:31369] mca:base:select:( plm) Querying component [slurm] > [nexus10.nlroc:31369] mca:base:select:( plm) Skipping component [slurm]. > Query failed to return a module > [nexus10.nlroc:31369] mca:base:select:( plm) Selected component [rsh] > [nexus10.nlroc:31369] mca: base: close: component isolated closed > [nexus10.nlroc:31369] mca: base: close: unloading component isolated > [nexus10.nlroc:31369] mca: base: close: component slurm closed > [nexus10.nlroc:31369] mca: base: close: unloading component slurm > [nexus10.nlroc:31369] mca: base: components_register: registering oob > components > [nexus10.nlroc:31369] mca: base: components_register: found loaded component > tcp > [nexus10.nlroc:31369] mca: base: components_register: component tcp register > function successful > [nexus10.nlroc:31369] mca: base: components_open: opening oob components > [nexus10.nlroc:31369] mca: base: components_open: found loaded component tcp > [nexus10.nlroc:31369] mca: base: components_open: component tcp open function > successful > [nexus10.nlroc:31369] mca:oob:select: checking available component tcp > [nexus10.nlroc:31369] mca:oob:select: Querying component [tcp] > [nexus10.nlroc:31369] oob:tcp: component_available called > [nexus10.nlroc:31369] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 > [nexus10.nlroc:31369] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 > [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:init creating module for V4 > address on interface en0 > [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:init adding 172.16.1.10 to our > list of V4 connections > [nexus10.nlroc:31369] [[56810,0],0] TCP STARTUP > [nexus10.nlroc:31369] [[56810,0],0] attempting to bind to IPv4 port 0 > [nexus10.nlroc:31369] [[56810,0],0] assigned IPv4 port 50908 > [nexus10.nlroc:31369] mca:oob:select: Adding component to end > [nexus10.nlroc:31369] mca:oob:select: Found 1 active transports > [nexus17.nlroc:60584] mca: base: components_register: registering plm > components > [nexus17.nlroc:60584] mca: base: components_register: found loaded component > rsh > [nexus17.nlroc:60584] mca: base: components_register: component rsh register > function successful > [nexus17.nlroc:60584] mca: base: components_open: opening plm components > [nexus17.nlroc:60584] mca: base: components_open: found loaded component rsh > [nexus17.nlroc:60584] mca: base: components_open: component rsh open function > successful > [nexus17.nlroc:60584] mca:base:select: Auto-selecting plm components > [nexus17.nlroc:60584] mca:base:select:( plm) Querying component [rsh] > [nexus17.nlroc:60584] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus17.nlroc:60584] mca:base:select:( plm) Selected component [rsh] > [nexus17.nlroc:60584] mca: base: components_register: registering oob > components > [nexus17.nlroc:60584] mca: base: components_register: found loaded component > tcp > [nexus17.nlroc:60584] mca: base: components_register: component tcp register > function successful > [nexus17.nlroc:60584] mca: base: components_open: opening oob components > [nexus17.nlroc:60584] mca: base: components_open: found loaded component tcp > [nexus17.nlroc:60584] mca: base: components_open: component tcp open function > successful > [nexus17.nlroc:60584] mca:oob:select: checking available component tcp > [nexus17.nlroc:60584] mca:oob:select: Querying component [tcp] > [nexus17.nlroc:60584] oob:tcp: component_available called > [nexus17.nlroc:60584] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 > [nexus17.nlroc:60584] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init creating module for V4 > address on interface en0 > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init adding 172.16.1.17 to our > list of V4 connections > [nexus17.nlroc:60584] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4 > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init creating module for V4 > address on interface en2 > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init adding 169.254.210.43 to our > list of V4 connections > [nexus17.nlroc:60584] [[56810,0],1] TCP STARTUP > [nexus17.nlroc:60584] [[56810,0],1] attempting to bind to IPv4 port 0 > [nexus17.nlroc:60584] [[56810,0],1] assigned IPv4 port 54613 > [nexus17.nlroc:60584] mca:oob:select: Adding component to end > [nexus17.nlroc:60584] mca:oob:select: Found 1 active transports > [nexus17.nlroc:60584] [[56810,0],1]: set_addr to uri > 3723100160.0;tcp://172.16.1.10:50908 > [nexus17.nlroc:60584] [[56810,0],1]:set_addr checking if peer [[56810,0],0] > is reachable via component tcp > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp: working peer [[56810,0],0] > address tcp://172.16.1.10:50908 > [nexus17.nlroc:60584] [[56810,0],1] PEER [[56810,0],0] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus17.nlroc:60584] [[56810,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus17.nlroc:60584] [[56810,0],1]:tcp set addr for peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]: peer [[56810,0],0] is reachable via > component tcp > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1]:tcp:processing set_peer cmd for interface > en0 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:10 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:10 > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:476] queue pending to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: initiating connection to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:490] connect to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] orte_tcp_peer_try_connect: attempting to > connect to proc [[56810,0],0] via interface en0 > [nexus17.nlroc:60584] [[56810,0],1] orte_tcp_peer_try_connect: attempting to > connect to proc [[56810,0],0] via interface en0 on socket 9 > [nexus17.nlroc:60584] [[56810,0],1] orte_tcp_peer_try_connect: attempting to > connect to proc [[56810,0],0] on 172.16.1.10:50908 - 0 retries > [nexus17.nlroc:60584] [[56810,0],1] waiting for connect completion to > [[56810,0],0] - activating send event > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler CONNECTING > [nexus17.nlroc:60584] [[56810,0],1]:tcp:complete_connect called for peer > [[56810,0],0] on socket 9 > [nexus17.nlroc:60584] [[56810,0],1] tcp_peer_complete_connect: sending ack to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] SEND CONNECT ACK > [nexus17.nlroc:60584] [[56810,0],1] send blocking of 40 bytes to socket 9 > [nexus17.nlroc:60584] [[56810,0],1] connect-ack sent to socket 9 > [nexus17.nlroc:60584] [[56810,0],1] tcp_peer_complete_connect: setting read > event on connection to [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0] mca_oob_tcp_listen_thread: new > connection: (12, 0) 172.16.1.17:54614 > [nexus10.nlroc:31369] [[56810,0],0] connection_handler: working connection > (12, 35) 172.16.1.17:54614 > [nexus10.nlroc:31369] [[56810,0],0] accept_connection: 172.16.1.17:54614 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called > [nexus10.nlroc:31369] [[56810,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 12 > [nexus10.nlroc:31369] [[56810,0],0] waiting for connect ack from UNKNOWN > [nexus10.nlroc:31369] [[56810,0],0] connect ack received from UNKNOWN > [nexus10.nlroc:31369] [[56810,0],0] connect-ack recvd from UNKNOWN > [nexus10.nlroc:31369] [[56810,0],0] mca_oob_tcp_recv_connect: connection from > new peer > [nexus10.nlroc:31369] [[56810,0],0] connect-ack header from [[56810,0],1] is > okay > [nexus10.nlroc:31369] [[56810,0],0] waiting for connect ack from [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] connect ack received from [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] connect-ack version from [[56810,0],1] > matches ours > [nexus10.nlroc:31369] [[56810,0],0] connect-ack [[56810,0],1] authenticated > [nexus10.nlroc:31369] [[56810,0],0] tcp:peer_accept called for peer > [[56810,0],1] in state UNKNOWN on socket 12 > [nexus10.nlroc:31369] [[56810,0],0] SEND CONNECT ACK > [nexus10.nlroc:31369] [[56810,0],0] send blocking of 40 bytes to socket 12 > [nexus10.nlroc:31369] [[56810,0],0] connect-ack sent to socket 12 > [nexus10.nlroc:31369] [[56810,0],0]-[[56810,0],1] tcp_peer_connected on > socket 12 > [nexus10.nlroc:31369] [[56810,0],0]-[[56810,0],1] accepted: 172.16.1.10 - > 172.16.1.17 nodelay 0 sndbuf 131072 rcvbuf 131072 flags 00000006 > [nexus10.nlroc:31369] [[56810,0],0] tcp:set_module called for peer > [[56810,0],1] > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] RECV CONNECT ACK FROM [[56810,0],0] ON > SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] waiting for connect ack from [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] connect ack received from [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus17.nlroc:60584] [[56810,0],1] connect-ack recvd from [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] connect-ack header from [[56810,0],0] is > okay > [nexus17.nlroc:60584] [[56810,0],1] waiting for connect ack from [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] connect ack received from [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] connect-ack version from [[56810,0],0] > matches ours > [nexus17.nlroc:60584] [[56810,0],1] connect-ack [[56810,0],0] authenticated > [nexus17.nlroc:60584] [[56810,0],1]-[[56810,0],0] tcp_peer_connected on > socket 9 > [nexus17.nlroc:60584] [[56810,0],1]-[[56810,0],0] connected: 172.16.1.17 - > 172.16.1.10 nodelay 0 sndbuf 131768 rcvbuf 131768 flags 00000006 > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler starting send/recv events > [nexus17.nlroc:60[nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > 584] [[56810,0],1] tcp:set_module called for peer [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 9699 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 9699 BYTES FOR DEST [[56810,0],0] TAG 10 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 9699 BYTES ON SOCKET 9 > [nexus10.nlroc:31369] [[56810,0],0]: set_addr to uri > 3723100160.1;tcp://172.16.1.17,169.254.210.43:54613 > [nexus10.nlroc:31369] [[56810,0],0]:set_addr checking if peer [[56810,0],1] > is reachable via component tcp > [nexus10.nlroc:31369] [[56810,0],0] oob:tcp: working peer [[56810,0],1] > address tcp://172.16.1.17,169.254.210.43:54613 > [nexus10.nlroc:31369] [[56810,0],0] PEER [[56810,0],1] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus10.nlroc:31369] [[56810,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus10.nlroc:31369] [[56810,0],0]:tcp set addr for peer [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS > 169.254.210.43 > [nexus10.nlroc:31369] [[56810,0],0]: peer [[56810,0],1] is reachable via > component tcp > [nexus10.nlroc:31369] [[56810,0],0] OOB_SEND: rml_oob_send.c:199 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:processing set_peer cmd for interface > en0 > [nexus10.nlroc:31369] [[56810,0],0] oob:base:send to target [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] oob:base:send known transport for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:send_nb to peer [[56810,0],1]:1 > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb to peer [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:508] post send to [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:442] processing send to peer > [[56810,0],1]:1 > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb: already connected to > [[56810,0],1] - queueing for send > [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:469] queue send to > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler called to send to peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler SENDING TO [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] MESSAGE SEND COMPLETE TO [[56810,0],1] OF > 105 BYTES ON SOCKET 12 > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate new recv msg > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler read hdr > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate data region of > size 105 > [nexus10.nlroc:31369] [[56810,0],0]: set_addr to uri > 3723100160.0;tcp://172.16.1.10:50908 > [nexus10.nlroc:31369] [[56810,0],0]:set_addr peer [[56810,0],0] is me > [nexus10.nlroc:31369] [[56810,0],0]: set_addr to uri > 3723100160.1;tcp://172.16.1.17,169.254.210.43:54613 > [nexus10.nlroc:31369] [[56810,0],0]:set_addr checking if peer [[56810,0],1] > is reachable via component tcp > [nexus10.nlroc:31369] [[56810,0],0] oob:tcp: working peer [[56810,0],1] > address tcp://172.16.1.17,169.254.210.43:54613 > [nexus10.nlroc:31369] [[56810,0],0] PEER [[56810,0],1] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus10.nlroc:31369] [[56810,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus10.nlroc:31369] [[56810,0],0]:tcp set addr for peer [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS > 169.254.210.43 > [nexus10.nlroc:31369] [[56810,0],0]: peer [[56810,0],1] is reachable via > component tcp > [nexus10.nlroc:31369] [[56810,0],0] OOB_SEND: rml_oob_send.c:199 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:processing set_peer cmd for interface > en0 > [nexus10.nlroc:31369] [[56810,0],0] oob:base:send to target [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] oob:base:send known transport for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:send_nb to peer [[56810,0],1]:15 > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb to peer [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:508] post send to [[56810,0],1] > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60584] [[56810,0],1] RECVD COMPLETE MESSAGE FROM [[56810,0],0] > (ORIGIN [[56810,0],0]) OF 105 BYTES FOR DEST [[56810,0],1] TAG 1 > [nexus17.nlroc:60584] [[56810,0],1] DELIVERING TO RML > [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:442] processing send to peer > [[56810,0],1]:15 > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb: already connected to > [[56810,0],1] - queueing for send > [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:469] queue send to > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler called to send to peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler SENDING TO [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] MESSAGE SEND COMPLETE TO [[56810,0],1] OF > 885 BYTES ON SOCKET 12 > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate new recv msg > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler read hdr > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate data region of > size 885 > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60584] [[56810,0],1] RECVD COMPLETE MESSAGE FROM [[56810,0],0] > (ORIGIN [[56810,0],0]) OF 885 BYTES FOR DEST [[56810,0],1] TAG 15 > [nexus17.nlroc:60584] [[56810,0],1] DELIVERING TO RML > [nexus17.nlroc:60584] [[56810,0],1]: set_addr to uri > 3723100160.0;tcp://172.16.1.10:50908 > [nexus17.nlroc:60584] [[56810,0],1]:set_addr checking if peer [[56810,0],0] > is reachable via component tcp > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp: working peer [[56810,0],0] > address tcp://172.16.1.10:50908 > [nexus17.nlroc:60584] [[56810,0],1] PEER [[56810,0],0] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus17.nlroc:60584] [[56810,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus17.nlroc:60584] [[56810,0],1]:tcp set addr for peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]: peer [[56810,0],0] is reachable via > component tcp > [nexus17.nlroc:60584] [[56810,0],1]: set_addr to uri > 3723100160.1;tcp://172.16.1.17,169.254.210.43:54613 > [nexus17.nlroc:60584] [[56810,0],1]:set_addr peer [[56810,0],1] is me > [nexus17.nlroc:60584] [[56810,0],1]:tcp:processing set_peer cmd for interface > en0 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:5 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:5 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 54 > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 54 BYTES ON SOCKET 9 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 54 BYTES FOR DEST [[56810,0],0] TAG 5 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus10.nlroc:31369] [[56810,0],0] plm:base:receive update proc state > command from [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0] plm:base:receive got update_proc_state > for job [56810,1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 183 > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 183 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 118 BYTES ON SOCKET 9 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 183 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 118 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 118 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60585] mca: base: components_register: registering oob > components > [nexus17.nlroc:60585] mca: base: components_register: found loaded component > tcp > [nexus17.nlroc:60585] mca: base: components_register: component tcp register > function successful > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 294 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 199 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 203 BYTES ON SOCKET 9 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 294 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 294 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 199 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 199 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60585] mca: base: components_register: registering plm > components > [nexus17.nlroc:60585] mca: base: components_register: found loaded component > isolated > [nexus17.nlroc:60585] mca: base: components_register: component isolated has > no register or open function > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 203 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 203 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60585] mca: base: components_register: found loaded component > rsh > [nexus17.nlroc:60585] mca: base: components_register: component rsh register > function successful > [nexus17.nlroc:60585] mca: base: components_register: found loaded component > slurm > [nexus17.nlroc:60585] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 92 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 92 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 395 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 92 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 395 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 395 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 572 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 1009 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] > DELIVERING TO RML > oob:base:send known transport for peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > Package: Open MPI XXX@nexus10.nlroc Distribution > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 572 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 773 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 572 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 1009 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 558 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: a[nexus10.nlroc:31369] > [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN > [[56810,0],1]) OF 1009 BYTES FOR DEST [[56810,0],0] TAG 2 > lready connected to [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 484 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 747 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer > [[56810,0],1] > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 773 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 773 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 591 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:[nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called > for peer [[56810,0],1] > 60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to > [[56810,0],0] - queueing for send > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF > 635 BYTES ON SOCKET 9 > [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer > [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 > [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] > [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer > [[56810,0],0]:2 > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of > size 558 > [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] > (ORIGIN [[56810,0],1]) OF 558 BYTES FOR DEST [[56810,0],0] TAG 2 > [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML > Open MPI: 1.8.1 > Open MPI repo revision: r31483 > Open MPI release date: Apr 22, 2014 > … > it continue and fully finish > > Case 3 is runs > > /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca oob_base_verbose > 10 -mca plm_base_verbose 10 --debug-daemons -host nexus17 ompi_info > [nexus10.nlroc:31479] mca: base: components_register: registering plm > components > [nexus10.nlroc:31479] mca: base: components_register: found loaded component > isolated > [nexus10.nlroc:31479] mca: base: components_register: component isolated has > no register or open function > [nexus10.nlroc:31479] mca: base: components_register: found loaded component > rsh > [nexus10.nlroc:31479] mca: base: components_register: component rsh register > function successful > [nexus10.nlroc:31479] mca: base: components_register: found loaded component > slurm > [nexus10.nlroc:31479] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:31479] mca: base: components_open: opening plm components > [nexus10.nlroc:31479] mca: base: components_open: found loaded component > isolated > [nexus10.nlroc:31479] mca: base: components_open: component isolated open > function successful > [nexus10.nlroc:31479] mca: base: components_open: found loaded component rsh > [nexus10.nlroc:31479] mca: base: components_open: component rsh open function > successful > [nexus10.nlroc:31479] mca: base: components_open: found loaded component slurm > [nexus10.nlroc:31479] mca: base: components_open: component slurm open > function successful > [nexus10.nlroc:31479] mca:base:select: Auto-selecting plm components > [nexus10.nlroc:31479] mca:base:select:( plm) Querying component [isolated] > [nexus10.nlroc:31479] mca:base:select:( plm) Query of component [isolated] > set priority to 0 > [nexus10.nlroc:31479] mca:base:select:( plm) Querying component [rsh] > [nexus10.nlroc:31479] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus10.nlroc:31479] mca:base:select:( plm) Querying component [slurm] > [nexus10.nlroc:31479] mca:base:select:( plm) Skipping component [slurm]. > Query failed to return a module > [nexus10.nlroc:31479] mca:base:select:( plm) Selected component [rsh] > [nexus10.nlroc:31479] mca: base: close: component isolated closed > [nexus10.nlroc:31479] mca: base: close: unloading component isolated > [nexus10.nlroc:31479] mca: base: close: component slurm closed > [nexus10.nlroc:31479] mca: base: close: unloading component slurm > [nexus10.nlroc:31479] mca: base: components_register: registering oob > components > [nexus10.nlroc:31479] mca: base: components_register: found loaded component > tcp > [nexus10.nlroc:31479] mca: base: components_register: component tcp register > function successful > [nexus10.nlroc:31479] mca: base: components_open: opening oob components > [nexus10.nlroc:31479] mca: base: components_open: found loaded component tcp > [nexus10.nlroc:31479] mca: base: components_open: component tcp open function > successful > [nexus10.nlroc:31479] mca:oob:select: checking available component tcp > [nexus10.nlroc:31479] mca:oob:select: Querying component [tcp] > [nexus10.nlroc:31479] oob:tcp: component_available called > [nexus10.nlroc:31479] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 > [nexus10.nlroc:31479] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 > [nexus10.nlroc:31479] [[56724,0],0] oob:tcp:init creating module for V4 > address on interface en0 > [nexus10.nlroc:31479] [[56724,0],0] oob:tcp:init adding 172.16.1.10 to our > list of V4 connections > [nexus10.nlroc:31479] [[56724,0],0] TCP STARTUP > [nexus10.nlroc:31479] [[56724,0],0] attempting to bind to IPv4 port 0 > [nexus10.nlroc:31479] [[56724,0],0] assigned IPv4 port 50923 > [nexus10.nlroc:31479] mca:oob:select: Adding component to end > [nexus10.nlroc:31479] mca:oob:select: Found 1 active transports > Daemon was launched on nexus17.nlroc - beginning to initialize > [nexus17.nlroc:60663] mca: base: components_register: registering plm > components > [nexus17.nlroc:60663] mca: base: components_register: found loaded component > rsh > [nexus17.nlroc:60663] mca: base: components_register: component rsh register > function successful > [nexus17.nlroc:60663] mca: base: components_open: opening plm components > [nexus17.nlroc:60663] mca: base: components_open: found loaded component rsh > [nexus17.nlroc:60663] mca: base: components_open: component rsh open function > successful > [nexus17.nlroc:60663] mca:base:select: Auto-selecting plm components > [nexus17.nlroc:60663] mca:base:select:( plm) Querying component [rsh] > [nexus17.nlroc:60663] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus17.nlroc:60663] mca:base:select:( plm) Selected component [rsh] > [nexus17.nlroc:60663] mca: base: components_register: registering oob > components > [nexus17.nlroc:60663] mca: base: components_register: found loaded component > tcp > [nexus17.nlroc:60663] mca: base: components_register: component tcp register > function successful > [nexus17.nlroc:60663] mca: base: components_open: opening oob components > [nexus17.nlroc:60663] mca: base: components_open: found loaded component tcp > [nexus17.nlroc:60663] mca: base: components_open: component tcp open function > successful > [nexus17.nlroc:60663] mca:oob:select: checking available component tcp > [nexus17.nlroc:60663] mca:oob:select: Querying component [tcp] > [nexus17.nlroc:60663] oob:tcp: component_available called > [nexus17.nlroc:60663] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 > [nexus17.nlroc:60663] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init creating module for V4 > address on interface en0 > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init adding 172.16.1.17 to our > list of V4 connections > [nexus17.nlroc:60663] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4 > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init creating module for V4 > address on interface en2 > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init adding 169.254.210.43 to our > list of V4 connections > [nexus17.nlroc:60663] [[56724,0],1] TCP STARTUP > [nexus17.nlroc:60663] [[56724,0],1] attempting to bind to IPv4 port 0 > [nexus17.nlroc:60663] [[56724,0],1] assigned IPv4 port 54631 > [nexus17.nlroc:60663] mca:oob:select: Adding component to end > [nexus17.nlroc:60663] mca:oob:select: Found 1 active transports > Daemon [[56724,0],1] checking in as pid 60663 on host nexus17 > [nexus17.nlroc:60663] [[56724,0],1] orted: up and running - waiting for > commands! > [nexus17.nlroc:60663] [[56724,0],1]: set_addr to uri > 3717464064.0;tcp://172.16.1.10:50923 > [nexus17.nlroc:60663] [[56724,0],1]:set_addr checking if peer [[56724,0],0] > is reachable via component tcp > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp: working peer [[56724,0],0] > address tcp://172.16.1.10:50923 > [nexus17.nlroc:60663] [[56724,0],1] PEER [[56724,0],0] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus17.nlroc:60663] [[56724,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus17.nlroc:60663] [[56724,0],1]:tcp set addr for peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]: peer [[56724,0],0] is reachable via > component tcp > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1]:tcp:processing set_peer cmd for interface > en0 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:10 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer > [[56724,0],0]:10 > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:476] queue pending to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: initiating connection to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:490] connect to [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] orte_tcp_peer_try_connect: attempting to > connect to proc [[56724,0],0] via interface en0 > [nexus17.nlroc:60663] [[56724,0],1] orte_tcp_peer_try_connect: attempting to > connect to proc [[56724,0],0] via interface en0 on socket 9 > [nexus17.nlroc:60663] [[56724,0],1] orte_tcp_peer_try_connect: attempting to > connect to proc [[56724,0],0] on 172.16.1.10:50923 - 0 retries > [nexus17.nlroc:60663] [[56724,0],1] waiting for connect completion to > [[56724,0],0] - activating send event > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler CONNECTING > [nexus17.nlroc:60663] [[56724,0],1]:tcp:complete_connect called for peer > [[56724,0],0] on socket 9 > [nexus10.nlroc:31479] [[56724,0],0] mca_oob_tcp_listen_thread: new > connection: (12, 0) 172.16.1.17:54632 > [nexus17.nlroc:60663] [[56724,0],1] tcp_peer_complete_connect: sending ack to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] SEND CONNECT ACK > [nexus17.nlroc:60663] [[56724,0],1] send blocking of 40 bytes to socket 9 > [nexus17.nlroc:60663] [[56724,0],1] connect-ack sent to socket 9 > [nexus17.nlroc:60663] [[56724,0],1] tcp_peer_complete_connect: setting read > event on connection to [[56724,0],0] > [nexus10.nlroc:31479] [[56724,0],0] connection_handler: working connection > (12, 35) 172.16.1.17:54632 > [nexus10.nlroc:31479] [[56724,0],0] accept_connection: 172.16.1.17:54632 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called > [nexus10.nlroc:31479] [[56724,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 12 > [nexus10.nlroc:31479] [[56724,0],0] waiting for connect ack from UNKNOWN > [nexus10.nlroc:31479] [[56724,0],0] connect ack received from UNKNOWN > [nexus10.nlroc:31479] [[56724,0],0] connect-ack recvd from UNKNOWN > [nexus10.nlroc:31479] [[56724,0],0] mca_oob_tcp_recv_connect: connection from > new peer > [nexus10.nlroc:31479] [[56724,0],0] connect-ack header from [[56724,0],1] is > okay > [nexus10.nlroc:31479] [[56724,0],0] waiting for connect ack from [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] connect ack received from [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] connect-ack version from [[56724,0],1] > matches ours > [nexus10.nlroc:31479] [[56724,0],0] connect-ack [[56724,0],1] authenticated > [nexus10.nlroc:31479] [[56724,0],0] tcp:peer_accept called for peer > [[56724,0],1] in state UNKNOWN on socket 12 > [nexus10.nlroc:31479] [[56724,0],0] SEND CONNECT ACK > [nexus10.nlroc:31479] [[56724,0],0] send blocking of 40 bytes to socket 12 > [nexus10.nlroc:31479] [[56724,0],0] connect-ack sent to socket 12 > [nexus10.nlroc:31479] [[56724,0],0]-[[56724,0],1] tcp_peer_connected on > socket 12 > [nexus10.nlroc:31479] [[56724,0],0]-[[56724,0],1] accepted: 172.16.1.10 - > 172.16.1.17 nodelay 0 sndbuf 131072 rcvbuf 131072 flags 00000006 > [nexus10.nlroc:31479] [[56724,0],0] tcp:set_module called for peer > [[56724,0],1] > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler called for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] RECV CONNECT ACK FROM [[56724,0],0] ON > SOCKET 9 > [nexus17.nlroc:60663] [[56724,0],1] waiting for connect ack from [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] connect ack received from [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] connect-ack recvd from [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] connect-ack header from [[56724,0],0] is > okay > [nexus17.nlroc:60663] [[56724,0],1] waiting for connect ack from [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] connect ack received from [[56724,0],0] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 9699 > [nexus17.nlroc:60663] [[56724,0],1] connect-ack version from [[56724,0],0] > matches ours > [nexus17.nlroc:60663] [[56724,0],1] connect-ack [[56724,0],0] authenticated > [nexus17.nlroc:60663] [[56724,0],1]-[[56724,0],0] tcp_peer_connected on > socket 9 > [nexus17.nlroc:60663] [[56724,0],1]-[[56724,0],0] connected: 172.16.1.17 - > 172.16.1.10 nodelay 0 sndbuf 131768 rcvbuf 131768 flags 00000006 > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler starting send/recv events > [nexus17.nlroc:60663] [[56724,0],1] tcp:set_module called for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 9699 BYTES ON SOCKET 9 > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 9699 BYTES FOR DEST [[56724,0],0] TAG 10 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus10.nlroc:31479] [[56724,0],0]: set_addr to uri > 3717464064.1;tcp://172.16.1.17,169.254.210.43:54631 > [nexus10.nlroc:31479] [[56724,0],0]:set_addr checking if peer [[56724,0],1] > is reachable via component tcp > [nexus10.nlroc:31479] [[56724,0],0] oob:tcp: working peer [[56724,0],1] > address tcp://172.16.1.17,169.254.210.43:54631 > [nexus10.nlroc:31479] [[56724,0],0] PEER [[56724,0],1] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus10.nlroc:31479] [[56724,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus10.nlroc:31479] [[56724,0],0]:tcp set addr for peer [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS > 169.254.210.43 > [nexus10.nlroc:31479] [[56724,0],0]: peer [[56724,0],1] is reachable via > component tcp > [nexus10.nlroc:31479] [[56724,0],0]:tcp:processing set_peer cmd for interface > en0 > [nexus10.nlroc:31479] [[56724,0],0]: set_addr to uri > 3717464064.0;tcp://172.16.1.10:50923 > [nexus10.nlroc:31479] [[56724,0],0]:set_addr peer [[56724,0],0] is me > [nexus10.nlroc:31479] [[56724,0],0]: set_addr to uri > 3717464064.1;tcp://172.16.1.17,169.254.210.43:54631 > [nexus10.nlroc:31479] [[56724,0],0]:set_addr checking if peer [[56724,0],1] > is reachable via component tcp > [nexus10.nlroc:31479] [[56724,0],0] oob:tcp: working peer [[56724,0],1] > address tcp://172.16.1.17,169.254.210.43:54631 > [nexus10.nlroc:31479] [[56724,0],0] PEER [[56724,0],1] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus10.nlroc:31479] [[56724,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus10.nlroc:31479] [[56724,0],0]:tcp set addr for peer [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS > 169.254.210.43 > [nexus10.nlroc:31479] [[56724,0],0]: peer [[56724,0],1] is reachable via > component tcp > [nexus10.nlroc:31479] [[56724,0],0] OOB_SEND: rml_oob_send.c:199 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:processing set_peer cmd for interface > en0 > [nexus10.nlroc:31479] [[56724,0],0] oob:base:send to target [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] oob:base:send known transport for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] oob:tcp:send_nb to peer [[56724,0],1]:15 > [nexus10.nlroc:31479] [[56724,0],0] tcp:send_nb to peer [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:[oob_tcp.c:508] post send to [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] orted_cmd: received add_local_procs > [nexus10.nlroc:31479] [[56724,0],0]:[oob_tcp.c:442] processing send to peer > [[56724,0],1]:15 > [nexus10.nlroc:31479] [[56724,0],0] tcp:send_nb: already connected to > [[56724,0],1] - queueing for send > [nexus10.nlroc:31479] [[56724,0],0]:[oob_tcp.c:469] queue send to > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] tcp:send_handler called to send to peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] tcp:send_handler SENDING TO [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] MESSAGE SEND COMPLETE TO [[56724,0],1] OF > 956 BYTES ON SOCKET 12 > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler called for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler allocate new recv msg > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler read hdr > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler allocate data region of > size 956 > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler called for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60663] [[56724,0],1] RECVD COMPLETE MESSAGE FROM [[56724,0],0] > (ORIGIN [[56724,0],0]) OF 956 BYTES FOR DEST [[56724,0],1] TAG 15 > [nexus17.nlroc:60663] [[56724,0],1] DELIVERING TO RML > [nexus17.nlroc:60663] [[56724,0],1]: set_addr to uri > 3717464064.0;tcp://172.16.1.10:50923 > [nexus17.nlroc:60663] [[56724,0],1]:set_addr checking if peer [[56724,0],0] > is reachable via component tcp > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp: working peer [[56724,0],0] > address tcp://172.16.1.10:50923 > [nexus17.nlroc:60663] [[56724,0],1] PEER [[56724,0],0] MAY BE REACHABLE USING > MODULE AT KINDEX 2 INTERFACE en0 > [nexus17.nlroc:60663] [[56724,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 > AT KERNEL INDEX 2 > [nexus17.nlroc:60663] [[56724,0],1]:tcp set addr for peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]: peer [[56724,0],0] is reachable via > component tcp > [nexus17.nlroc:60663] [[56724,0],1]: set_addr to uri > 3717464064.1;tcp://172.16.1.17,169.254.210.43:54631 > [nexus17.nlroc:60663] [[56724,0],1]:set_addr peer [[56724,0],1] is me > [nexus17.nlroc:60663] [[56724,0],1]:tcp:processing set_peer cmd for interface > en0 > [nexus17.nlroc:60663] [[56724,0],1] orted_cmd: received add_local_procs > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:5 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 54 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 54 BYTES FOR DEST [[56724,0],0] TAG 5 > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer > [[56724,0],0]:5 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to > [[56724,0],0] - queueing for send > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 54 BYTES ON SOCKET 9 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus10.nlroc:31479] [[56724,0],0] plm:base:receive update proc state > command from [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0] plm:base:receive got update_proc_state > for job [56724,1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 183 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 183 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer > [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to > [[56724,0],0] - queueing for send > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 183 BYTES ON SOCKET 9 > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 > [nexus17.nlroc:[nexus17.nlroc:60664] mca: base: components_register: > registering oob components > [nexus17.nlroc:60664] mca: base: components_register: found loaded component > tcp > 60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer > [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to > [[56724,0],0] - queueing for send > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 118 BYTES ON SOCKET 9 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 118 > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 118 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus17.nlroc:60664] mca: base: components_register: component tcp register > function successful > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 294 > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer > [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to > [[56724,0],0] - queueing for send > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 294 BYTES ON SOCKET 9 > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer > [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to > [[56724,0],0] - queueing for send > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 282 BYTES ON SOCKET 9 > [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] > [nexus17.nlroc:60663] [[56724,[nexus10.nlroc:31479] > [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] > 0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:2 > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to > [[56724,0],0] - queueing for send > [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to > [[56724,0],0] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 294 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer > [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] > [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF > 120 BYTES ON SOCKET 9 > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 282 > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 282 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 120 > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 120 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus17.nlroc:60664] mca: base: components_register: registering plm > components > [nexus17.nlroc:60664] mca: base: components_register: found loaded component > isolated > [nexus17.nlroc:60664] mca: base: components_register: component isolated has > no register or open function > [nexus17.nlroc:60664] mca: base: components_register: found loaded component > rsh > [nexus17.nlroc:60664] mca: base: components_register: component rsh register > function successful > [nexus17.nlroc:60664] mca: base: components_register: found loaded component > slurm > [nexus17.nlroc:60664] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 92 > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 92 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer > [[56724,0],1] > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr > [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of > size 560 > [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] > (ORIGIN [[56724,0],1]) OF 560 BYTES FOR DEST [[56724,0],0] TAG 2 > [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML > Package: Open MPI XXXX@nexus10.nlroc Distribution > > … > > and continue until the end > > > > > On Tue, Jul 15, 2014 at 2:58 PM, Ralph Castain <r...@open-mpi.org> wrote: > I'm afraid I don't understand your comment about "another mpi process". > Looking at your output, it would appear that there is something going on with > host nexus17. In both cases, mpirun is launching a single daemon onto only > one other node - the only difference was in the node being used. The > "no_tree_spawn" flag did nothing as that only applies when there are multiple > nodes being used. > > I would check to see if there is a firewall between nexus10 and nexus17. You > can also add -mca oob_base_verbose 10 to your cmd line and see if the daemon > on nexus17 is able to connect back to mpirun., and add --debug-daemons to see > any error messages that daemon may be trying to report. > > > On Jul 15, 2014, at 3:08 AM, Ricardo Fernández-Perea > <rfernandezpe...@gmail.com> wrote: > >> I have try if another mpi process is running in the node already the process >> run >> >> $ricardo$ /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca >> plm_base_verbose 10 -host nexus16 ompi_info >> [nexus10.nlroc:27397] mca: base: components_register: registering plm >> components >> [nexus10.nlroc:27397] mca: base: components_register: found loaded component >> isolated >> [nexus10.nlroc:27397] mca: base: components_register: component isolated has >> no register or open function >> [nexus10.nlroc:27397] mca: base: components_register: found loaded component >> rsh >> [nexus10.nlroc:27397] mca: base: components_register: component rsh register >> function successful >> [nexus10.nlroc:27397] mca: base: components_register: found loaded component >> slurm >> [nexus10.nlroc:27397] mca: base: components_register: component slurm >> register function successful >> [nexus10.nlroc:27397] mca: base: components_open: opening plm components >> [nexus10.nlroc:27397] mca: base: components_open: found loaded component >> isolated >> [nexus10.nlroc:27397] mca: base: components_open: component isolated open >> function successful >> [nexus10.nlroc:27397] mca: base: components_open: found loaded component rsh >> [nexus10.nlroc:27397] mca: base: components_open: component rsh open >> function successful >> [nexus10.nlroc:27397] mca: base: components_open: found loaded component >> slurm >> [nexus10.nlroc:27397] mca: base: components_open: component slurm open >> function successful >> [nexus10.nlroc:27397] mca:base:select: Auto-selecting plm components >> [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [isolated] >> [nexus10.nlroc:27397] mca:base:select:( plm) Query of component [isolated] >> set priority to 0 >> [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [rsh] >> [nexus10.nlroc:27397] mca:base:select:( plm) Query of component [rsh] set >> priority to 10 >> [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [slurm] >> [nexus10.nlroc:27397] mca:base:select:( plm) Skipping component [slurm]. >> Query failed to return a module >> [nexus10.nlroc:27397] mca:base:select:( plm) Selected component [rsh] >> [nexus10.nlroc:27397] mca: base: close: component isolated closed >> [nexus10.nlroc:27397] mca: base: close: unloading component isolated >> [nexus10.nlroc:27397] mca: base: close: component slurm closed >> [nexus10.nlroc:27397] mca: base: close: unloading component slurm >> [nexus10.nlroc:27397] [[52326,0],0] plm:base:receive update proc state >> command from [[52326,0],1] >> [nexus10.nlroc:27397] [[52326,0],0] plm:base:receive got update_proc_state >> for job [52326,1] >> [nexus16.nlroc:59687] mca: base: components_register: registering plm >> components >> [nexus16.nlroc:59687] mca: base: components_register: found loaded component >> isolated >> [nexus16.nlroc:59687] mca: base: components_register: component isolated has >> no register or open function >> [nexus16.nlroc:59687] mca: base: components_register: found loaded component >> rsh >> [nexus16.nlroc:59687] mca: base: components_register: component rsh register >> function successful >> [nexus16.nlroc:59687] mca: base: components_register: found loaded component >> slurm >> [nexus16.nlroc:59687] mca: base: components_register: component slurm >> register function successful >> Package: Open MPI XXXX@nexus10.nlroc Distribution >> Open MPI: 1.8.1 >> Open MPI repo revision: r31483 >> Open MPI release date: Apr 22, 2014 >> Open RTE: 1.8.1 >> … >> >> but if the compute node has not a mpi process running in it it already hangs >> as >> >> /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca plm_base_verbose >> 10 -host nexus17 ompi_info >> [nexus10.nlroc:27438] mca: base: components_register: registering plm >> components >> [nexus10.nlroc:27438] mca: base: components_register: found loaded component >> isolated >> [nexus10.nlroc:27438] mca: base: components_register: component isolated has >> no register or open function >> [nexus10.nlroc:27438] mca: base: components_register: found loaded component >> rsh >> [nexus10.nlroc:27438] mca: base: components_register: component rsh register >> function successful >> [nexus10.nlroc:27438] mca: base: components_register: found loaded component >> slurm >> [nexus10.nlroc:27438] mca: base: components_register: component slurm >> register function successful >> [nexus10.nlroc:27438] mca: base: components_open: opening plm components >> [nexus10.nlroc:27438] mca: base: components_open: found loaded component >> isolated >> [nexus10.nlroc:27438] mca: base: components_open: component isolated open >> function successful >> [nexus10.nlroc:27438] mca: base: components_open: found loaded component rsh >> [nexus10.nlroc:27438] mca: base: components_open: component rsh open >> function successful >> [nexus10.nlroc:27438] mca: base: components_open: found loaded component >> slurm >> [nexus10.nlroc:27438] mca: base: components_open: component slurm open >> function successful >> [nexus10.nlroc:27438] mca:base:select: Auto-selecting plm components >> [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [isolated] >> [nexus10.nlroc:27438] mca:base:select:( plm) Query of component [isolated] >> set priority to 0 >> [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [rsh] >> [nexus10.nlroc:27438] mca:base:select:( plm) Query of component [rsh] set >> priority to 10 >> [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [slurm] >> [nexus10.nlroc:27438] mca:base:select:( plm) Skipping component [slurm]. >> Query failed to return a module >> [nexus10.nlroc:27438] mca:base:select:( plm) Selected component [rsh] >> [nexus10.nlroc:27438] mca: base: close: component isolated closed >> [nexus10.nlroc:27438] mca: base: close: unloading component isolated >> [nexus10.nlroc:27438] mca: base: close: component slurm closed >> [nexus10.nlroc:27438] mca: base: close: unloading component slurm >> >> and it stop there >> >> >> >> >> On Mon, Jul 14, 2014 at 8:56 PM, Ralph Castain <r...@open-mpi.org> wrote: >> Hmmm...no, it worked just fine for me. It sounds like something else is >> going on. >> >> Try configuring OMPI with --enable-debug, and then add -mca plm_base_verbose >> 10 to get a better sense of what is going on. >> >> >> On Jul 14, 2014, at 10:27 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> I confess I haven't tested no_tree_spawn in ages, so it is quite possible >>> it has suffered bit rot. I can try to take a look at it in a bit >>> >>> >>> On Jul 14, 2014, at 10:13 AM, Ricardo Fernández-Perea >>> <rfernandezpe...@gmail.com> wrote: >>> >>>> Thank you for the fast answer >>>> >>>> While that resolve my problem with cross ssh authentication a command as >>>> >>>> /opt/openmpi/bin/mpirun --mca mtl mx --mca pml cm --mca >>>> plm_rsh_no_tree_spawn 1 -hostfile hostfile ompi_info >>>> >>>> just hung with no output and although there is a ssh connexion no orte >>>> program is initiated in the destination nodes >>>> >>>> and while >>>> >>>> /opt/openmpi/bin/mpirun -host host18 ompi_info >>>> >>>> works >>>> >>>> /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -host host18 >>>> ompi_info >>>> >>>> hangs, is there some condition in the use of this parameter. >>>> >>>> Yours truly >>>> >>>> Ricardo >>>> >>>> >>>> >>>> On Mon, Jul 14, 2014 at 6:35 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> During the 1.7 series and for all follow-on series, OMPI changed to a mode >>>> where it launches a daemon on all allocated nodes at the startup of >>>> mpirun. This allows us to determine the hardware topology of the nodes and >>>> take that into account when mapping. You can override that behavior by >>>> either adding --novm to your cmd line (which will impact your >>>> mapping/binding options), or by specifying the hosts to use by editing >>>> your hostfile, or adding --host host1,host2 to your cmd line >>>> >>>> The rsh launcher defaults to a tree-based pattern, thus requiring that we >>>> be able to ssh from one compute node to another. You can change that to a >>>> less scalable direct mode by adding >>>> >>>> --mca plm_rsh_no_tree_spawn 1 >>>> >>>> to the cmd line >>>> >>>> >>>> On Jul 14, 2014, at 9:21 AM, Ricardo Fernández-Perea >>>> <rfernandezpe...@gmail.com> wrote: >>>> >>>> > I'm trying to update to openMPI 1.8.1 thru ssh and Myrinet >>>> > >>>> > running a command as >>>> > >>>> > /opt/openmpi/bin/mpirun --verbose --mca mtl mx --mca pml cm -hostfile >>>> > hostfile -np 16 >>>> > >>>> > when the hostfile contain only two nodes as >>>> > >>>> > host1 slots=8 max-slots=8 >>>> > host2 slots=8 max-slots=8 >>>> > >>>> > it runs perfectly but when the hostfile has a third node as >>>> > >>>> > >>>> > host1 slots=8 max-slots=8 >>>> > host2 slots=8 max-slots=8 >>>> > host3 slots=8 max-slots=8 >>>> > >>>> > it try to establish an ssh connection between the running hosts1 and >>>> > host3 that should not run any process that fails hanging the process >>>> > without signaling. >>>> > >>>> > >>>> > my ompi_info is as follow >>>> > >>>> > Package: Open MPI XXX Distribution >>>> > Open MPI: 1.8.1 >>>> > Open MPI repo revision: r31483 >>>> > Open MPI release date: Apr 22, 2014 >>>> > Open RTE: 1.8.1 >>>> > Open RTE repo revision: r31483 >>>> > Open RTE release date: Apr 22, 2014 >>>> > OPAL: 1.8.1 >>>> > OPAL repo revision: r31483 >>>> > OPAL release date: Apr 22, 2014 >>>> > MPI API: 3.0 >>>> > Ident string: 1.8.1 >>>> > Prefix: /opt/openmpi >>>> > Configured architecture: x86_64-apple-darwin9.8.0 >>>> > Configure host: XXXX >>>> > Configured by: XXXX >>>> > Configured on: Thu Jun 12 10:37:33 CEST 2014 >>>> > Configure host: XXXX >>>> > Built by: XXXX >>>> > Built on: Thu Jun 12 11:13:16 CEST 2014 >>>> > Built host: XXXX >>>> > C bindings: yes >>>> > C++ bindings: yes >>>> > Fort mpif.h: yes (single underscore) >>>> > Fort use mpi: yes (full: ignore TKR) >>>> > Fort use mpi size: deprecated-ompi-info-value >>>> > Fort use mpi_f08: yes >>>> > Fort mpi_f08 compliance: The mpi_f08 module is available, but due to >>>> > limitations in the ifort compiler, does not >>>> > support >>>> > the following: array subsections, direct >>>> > passthru >>>> > (where possible) to underlying Open MPI's C >>>> > functionality >>>> > Fort mpi_f08 subarrays: no >>>> > Java bindings: no >>>> > Wrapper compiler rpath: unnecessary >>>> > C compiler: icc >>>> > C compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icc >>>> > C compiler family name: INTEL >>>> > C compiler version: 1110.20091130 >>>> > C++ compiler: icpc >>>> > C++ compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icpc >>>> > Fort compiler: ifort >>>> > Fort compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort >>>> > Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::) >>>> > Fort 08 assumed shape: no >>>> > Fort optional args: yes >>>> > Fort BIND(C) (all): yes >>>> > Fort ISO_C_BINDING: yes >>>> > Fort SUBROUTINE BIND(C): yes >>>> > Fort TYPE,BIND(C): yes >>>> > Fort T,BIND(C,name="a"): yes >>>> > Fort PRIVATE: yes >>>> > Fort PROTECTED: yes >>>> > Fort ABSTRACT: yes >>>> > Fort ASYNCHRONOUS: yes >>>> > Fort PROCEDURE: yes >>>> > Fort f08 using wrappers: yes >>>> > C profiling: yes >>>> > C++ profiling: yes >>>> > Fort mpif.h profiling: yes >>>> > Fort use mpi profiling: yes >>>> > Fort use mpi_f08 prof: yes >>>> > C++ exceptions: no >>>> > Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: >>>> > yes, >>>> > OMPI progress: no, ORTE progress: yes, Event >>>> > lib: >>>> > yes) >>>> > Sparse Groups: no >>>> > Internal debug support: no >>>> > MPI interface warnings: yes >>>> > MPI parameter check: runtime >>>> > Memory profiling support: no >>>> > Memory debugging support: no >>>> > libltdl support: yes >>>> > Heterogeneous support: no >>>> > mpirun default --prefix: no >>>> > MPI I/O support: yes >>>> > MPI_WTIME support: gettimeofday >>>> > Symbol vis. support: yes >>>> > Host topology support: yes >>>> > MPI extensions: >>>> > FT Checkpoint support: no (checkpoint thread: no) >>>> > C/R Enabled Debugging: no >>>> > VampirTrace support: yes >>>> > MPI_MAX_PROCESSOR_NAME: 256 >>>> > MPI_MAX_ERROR_STRING: 256 >>>> > MPI_MAX_OBJECT_NAME: 64 >>>> > MPI_MAX_INFO_KEY: 36 >>>> > MPI_MAX_INFO_VAL: 256 >>>> > MPI_MAX_PORT_NAME: 1024 >>>> > MPI_MAX_DATAREP_STRING: 128 >>>> > >>>> > >>>> > _______________________________________________ >>>> > users mailing list >>>> > us...@open-mpi.org >>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> > Link to this post: >>>> > http://www.open-mpi.org/community/lists/users/2014/07/24764.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/07/24765.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/07/24766.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24768.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24769.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24770.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24771.php