What I mean with "another mpi process". I have 4 nodes where there is process that use mpi and where initiated using mpirun from the control node already running when I run the command against any of those nodes it execute but when I do it against any other node it fails if no_tree_spawn flag is used it works OK
case 1 it Fails /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca oob_base_verbose 10 -mca plm_base_verbose 10 -host nexus17 ompi_info [nexus10.nlroc:31321] mca: base: components_register: registering plm components [nexus10.nlroc:31321] mca: base: components_register: found loaded component isolated [nexus10.nlroc:31321] mca: base: components_register: component isolated has no register or open function [nexus10.nlroc:31321] mca: base: components_register: found loaded component rsh [nexus10.nlroc:31321] mca: base: components_register: component rsh register function successful [nexus10.nlroc:31321] mca: base: components_register: found loaded component slurm [nexus10.nlroc:31321] mca: base: components_register: component slurm register function successful [nexus10.nlroc:31321] mca: base: components_open: opening plm components [nexus10.nlroc:31321] mca: base: components_open: found loaded component isolated [nexus10.nlroc:31321] mca: base: components_open: component isolated open function successful [nexus10.nlroc:31321] mca: base: components_open: found loaded component rsh [nexus10.nlroc:31321] mca: base: components_open: component rsh open function successful [nexus10.nlroc:31321] mca: base: components_open: found loaded component slurm [nexus10.nlroc:31321] mca: base: components_open: component slurm open function successful [nexus10.nlroc:31321] mca:base:select: Auto-selecting plm components [nexus10.nlroc:31321] mca:base:select:( plm) Querying component [isolated] [nexus10.nlroc:31321] mca:base:select:( plm) Query of component [isolated] set priority to 0 [nexus10.nlroc:31321] mca:base:select:( plm) Querying component [rsh] [nexus10.nlroc:31321] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus10.nlroc:31321] mca:base:select:( plm) Querying component [slurm] [nexus10.nlroc:31321] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [nexus10.nlroc:31321] mca:base:select:( plm) Selected component [rsh] [nexus10.nlroc:31321] mca: base: close: component isolated closed [nexus10.nlroc:31321] mca: base: close: unloading component isolated [nexus10.nlroc:31321] mca: base: close: component slurm closed [nexus10.nlroc:31321] mca: base: close: unloading component slurm [nexus10.nlroc:31321] mca: base: components_register: registering oob components [nexus10.nlroc:31321] mca: base: components_register: found loaded component tcp [nexus10.nlroc:31321] mca: base: components_register: component tcp register function successful [nexus10.nlroc:31321] mca: base: components_open: opening oob components [nexus10.nlroc:31321] mca: base: components_open: found loaded component tcp [nexus10.nlroc:31321] mca: base: components_open: component tcp open function successful [nexus10.nlroc:31321] mca:oob:select: checking available component tcp [nexus10.nlroc:31321] mca:oob:select: Querying component [tcp] [nexus10.nlroc:31321] oob:tcp: component_available called [nexus10.nlroc:31321] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [nexus10.nlroc:31321] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [nexus10.nlroc:31321] [[56634,0],0] oob:tcp:init creating module for V4 address on interface en0 [nexus10.nlroc:31321] [[56634,0],0] oob:tcp:init adding 172.16.1.10 to our list of V4 connections [nexus10.nlroc:31321] [[56634,0],0] TCP STARTUP [nexus10.nlroc:31321] [[56634,0],0] attempting to bind to IPv4 port 0 [nexus10.nlroc:31321] [[56634,0],0] assigned IPv4 port 50898 [nexus10.nlroc:31321] mca:oob:select: Adding component to end [nexus10.nlroc:31321] mca:oob:select: Found 1 active transports I Crtl-C here when it hangs ^C[nexus10.nlroc:31321] [[56634,0],0] OOB_SEND: rml_oob_send.c:199 [nexus10.nlroc:31321] [[56634,0],0] oob:base:send to target [[56634,0],1] [nexus10.nlroc:31321] [[56634,0],0] oob:base:send unknown peer [[56634,0],1] [nexus10.nlroc:31321] [[56634,0],0] is NOT reachable by TCP [nexus10.nlroc:31321] mca: base: close: component rsh closed [nexus10.nlroc:31321] mca: base: close: unloading component rsh [nexus10.nlroc:31321] [[56634,0],0] TCP SHUTDOWN [nexus10.nlroc:31321] mca: base: close: component tcp closed [nexus10.nlroc:31321] mca: base: close: unloading component tcp Case 2 to the same node but without the rsh_no_tree flag /opt/openmpi/bin/mpirun -mca oob_base_verbose 10 -mca plm_base_verbose 10 -host nexus17 ompi_info [nexus10.nlroc:31369] mca: base: components_register: registering plm components [nexus10.nlroc:31369] mca: base: components_register: found loaded component isolated [nexus10.nlroc:31369] mca: base: components_register: component isolated has no register or open function [nexus10.nlroc:31369] mca: base: components_register: found loaded component rsh [nexus10.nlroc:31369] mca: base: components_register: component rsh register function successful [nexus10.nlroc:31369] mca: base: components_register: found loaded component slurm [nexus10.nlroc:31369] mca: base: components_register: component slurm register function successful [nexus10.nlroc:31369] mca: base: components_open: opening plm components [nexus10.nlroc:31369] mca: base: components_open: found loaded component isolated [nexus10.nlroc:31369] mca: base: components_open: component isolated open function successful [nexus10.nlroc:31369] mca: base: components_open: found loaded component rsh [nexus10.nlroc:31369] mca: base: components_open: component rsh open function successful [nexus10.nlroc:31369] mca: base: components_open: found loaded component slurm [nexus10.nlroc:31369] mca: base: components_open: component slurm open function successful [nexus10.nlroc:31369] mca:base:select: Auto-selecting plm components [nexus10.nlroc:31369] mca:base:select:( plm) Querying component [isolated] [nexus10.nlroc:31369] mca:base:select:( plm) Query of component [isolated] set priority to 0 [nexus10.nlroc:31369] mca:base:select:( plm) Querying component [rsh] [nexus10.nlroc:31369] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus10.nlroc:31369] mca:base:select:( plm) Querying component [slurm] [nexus10.nlroc:31369] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [nexus10.nlroc:31369] mca:base:select:( plm) Selected component [rsh] [nexus10.nlroc:31369] mca: base: close: component isolated closed [nexus10.nlroc:31369] mca: base: close: unloading component isolated [nexus10.nlroc:31369] mca: base: close: component slurm closed [nexus10.nlroc:31369] mca: base: close: unloading component slurm [nexus10.nlroc:31369] mca: base: components_register: registering oob components [nexus10.nlroc:31369] mca: base: components_register: found loaded component tcp [nexus10.nlroc:31369] mca: base: components_register: component tcp register function successful [nexus10.nlroc:31369] mca: base: components_open: opening oob components [nexus10.nlroc:31369] mca: base: components_open: found loaded component tcp [nexus10.nlroc:31369] mca: base: components_open: component tcp open function successful [nexus10.nlroc:31369] mca:oob:select: checking available component tcp [nexus10.nlroc:31369] mca:oob:select: Querying component [tcp] [nexus10.nlroc:31369] oob:tcp: component_available called [nexus10.nlroc:31369] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [nexus10.nlroc:31369] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:init creating module for V4 address on interface en0 [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:init adding 172.16.1.10 to our list of V4 connections [nexus10.nlroc:31369] [[56810,0],0] TCP STARTUP [nexus10.nlroc:31369] [[56810,0],0] attempting to bind to IPv4 port 0 [nexus10.nlroc:31369] [[56810,0],0] assigned IPv4 port 50908 [nexus10.nlroc:31369] mca:oob:select: Adding component to end [nexus10.nlroc:31369] mca:oob:select: Found 1 active transports [nexus17.nlroc:60584] mca: base: components_register: registering plm components [nexus17.nlroc:60584] mca: base: components_register: found loaded component rsh [nexus17.nlroc:60584] mca: base: components_register: component rsh register function successful [nexus17.nlroc:60584] mca: base: components_open: opening plm components [nexus17.nlroc:60584] mca: base: components_open: found loaded component rsh [nexus17.nlroc:60584] mca: base: components_open: component rsh open function successful [nexus17.nlroc:60584] mca:base:select: Auto-selecting plm components [nexus17.nlroc:60584] mca:base:select:( plm) Querying component [rsh] [nexus17.nlroc:60584] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus17.nlroc:60584] mca:base:select:( plm) Selected component [rsh] [nexus17.nlroc:60584] mca: base: components_register: registering oob components [nexus17.nlroc:60584] mca: base: components_register: found loaded component tcp [nexus17.nlroc:60584] mca: base: components_register: component tcp register function successful [nexus17.nlroc:60584] mca: base: components_open: opening oob components [nexus17.nlroc:60584] mca: base: components_open: found loaded component tcp [nexus17.nlroc:60584] mca: base: components_open: component tcp open function successful [nexus17.nlroc:60584] mca:oob:select: checking available component tcp [nexus17.nlroc:60584] mca:oob:select: Querying component [tcp] [nexus17.nlroc:60584] oob:tcp: component_available called [nexus17.nlroc:60584] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [nexus17.nlroc:60584] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init creating module for V4 address on interface en0 [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init adding 172.16.1.17 to our list of V4 connections [nexus17.nlroc:60584] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4 [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init creating module for V4 address on interface en2 [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:init adding 169.254.210.43 to our list of V4 connections [nexus17.nlroc:60584] [[56810,0],1] TCP STARTUP [nexus17.nlroc:60584] [[56810,0],1] attempting to bind to IPv4 port 0 [nexus17.nlroc:60584] [[56810,0],1] assigned IPv4 port 54613 [nexus17.nlroc:60584] mca:oob:select: Adding component to end [nexus17.nlroc:60584] mca:oob:select: Found 1 active transports [nexus17.nlroc:60584] [[56810,0],1]: set_addr to uri 3723100160.0;tcp:// 172.16.1.10:50908 [nexus17.nlroc:60584] [[56810,0],1]:set_addr checking if peer [[56810,0],0] is reachable via component tcp [nexus17.nlroc:60584] [[56810,0],1] oob:tcp: working peer [[56810,0],0] address tcp://172.16.1.10:50908 [nexus17.nlroc:60584] [[56810,0],1] PEER [[56810,0],0] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus17.nlroc:60584] [[56810,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus17.nlroc:60584] [[56810,0],1]:tcp set addr for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]: peer [[56810,0],0] is reachable via component tcp [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1]:tcp:processing set_peer cmd for interface en0 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:10 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:10 [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:476] queue pending to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: initiating connection to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:490] connect to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[56810,0],0] via interface en0 [nexus17.nlroc:60584] [[56810,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[56810,0],0] via interface en0 on socket 9 [nexus17.nlroc:60584] [[56810,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[56810,0],0] on 172.16.1.10:50908 - 0 retries [nexus17.nlroc:60584] [[56810,0],1] waiting for connect completion to [[56810,0],0] - activating send event [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler CONNECTING [nexus17.nlroc:60584] [[56810,0],1]:tcp:complete_connect called for peer [[56810,0],0] on socket 9 [nexus17.nlroc:60584] [[56810,0],1] tcp_peer_complete_connect: sending ack to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] SEND CONNECT ACK [nexus17.nlroc:60584] [[56810,0],1] send blocking of 40 bytes to socket 9 [nexus17.nlroc:60584] [[56810,0],1] connect-ack sent to socket 9 [nexus17.nlroc:60584] [[56810,0],1] tcp_peer_complete_connect: setting read event on connection to [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0] mca_oob_tcp_listen_thread: new connection: (12, 0) 172.16.1.17:54614 [nexus10.nlroc:31369] [[56810,0],0] connection_handler: working connection (12, 35) 172.16.1.17:54614 [nexus10.nlroc:31369] [[56810,0],0] accept_connection: 172.16.1.17:54614 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called [nexus10.nlroc:31369] [[56810,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 12 [nexus10.nlroc:31369] [[56810,0],0] waiting for connect ack from UNKNOWN [nexus10.nlroc:31369] [[56810,0],0] connect ack received from UNKNOWN [nexus10.nlroc:31369] [[56810,0],0] connect-ack recvd from UNKNOWN [nexus10.nlroc:31369] [[56810,0],0] mca_oob_tcp_recv_connect: connection from new peer [nexus10.nlroc:31369] [[56810,0],0] connect-ack header from [[56810,0],1] is okay [nexus10.nlroc:31369] [[56810,0],0] waiting for connect ack from [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] connect ack received from [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] connect-ack version from [[56810,0],1] matches ours [nexus10.nlroc:31369] [[56810,0],0] connect-ack [[56810,0],1] authenticated [nexus10.nlroc:31369] [[56810,0],0] tcp:peer_accept called for peer [[56810,0],1] in state UNKNOWN on socket 12 [nexus10.nlroc:31369] [[56810,0],0] SEND CONNECT ACK [nexus10.nlroc:31369] [[56810,0],0] send blocking of 40 bytes to socket 12 [nexus10.nlroc:31369] [[56810,0],0] connect-ack sent to socket 12 [nexus10.nlroc:31369] [[56810,0],0]-[[56810,0],1] tcp_peer_connected on socket 12 [nexus10.nlroc:31369] [[56810,0],0]-[[56810,0],1] accepted: 172.16.1.10 - 172.16.1.17 nodelay 0 sndbuf 131072 rcvbuf 131072 flags 00000006 [nexus10.nlroc:31369] [[56810,0],0] tcp:set_module called for peer [[56810,0],1] [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] RECV CONNECT ACK FROM [[56810,0],0] ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] waiting for connect ack from [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] connect ack received from [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus17.nlroc:60584] [[56810,0],1] connect-ack recvd from [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] connect-ack header from [[56810,0],0] is okay [nexus17.nlroc:60584] [[56810,0],1] waiting for connect ack from [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] connect ack received from [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] connect-ack version from [[56810,0],0] matches ours [nexus17.nlroc:60584] [[56810,0],1] connect-ack [[56810,0],0] authenticated [nexus17.nlroc:60584] [[56810,0],1]-[[56810,0],0] tcp_peer_connected on socket 9 [nexus17.nlroc:60584] [[56810,0],1]-[[56810,0],0] connected: 172.16.1.17 - 172.16.1.10 nodelay 0 sndbuf 131768 rcvbuf 131768 flags 00000006 [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler starting send/recv events [nexus17.nlroc:60[nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr 584] [[56810,0],1] tcp:set_module called for peer [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 9699 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 9699 BYTES FOR DEST [[56810,0],0] TAG 10 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 9699 BYTES ON SOCKET 9 [nexus10.nlroc:31369] [[56810,0],0]: set_addr to uri 3723100160.1;tcp:// 172.16.1.17,169.254.210.43:54613 [nexus10.nlroc:31369] [[56810,0],0]:set_addr checking if peer [[56810,0],1] is reachable via component tcp [nexus10.nlroc:31369] [[56810,0],0] oob:tcp: working peer [[56810,0],1] address tcp://172.16.1.17,169.254.210.43:54613 [nexus10.nlroc:31369] [[56810,0],0] PEER [[56810,0],1] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus10.nlroc:31369] [[56810,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus10.nlroc:31369] [[56810,0],0]:tcp set addr for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS 169.254.210.43 [nexus10.nlroc:31369] [[56810,0],0]: peer [[56810,0],1] is reachable via component tcp [nexus10.nlroc:31369] [[56810,0],0] OOB_SEND: rml_oob_send.c:199 [nexus10.nlroc:31369] [[56810,0],0]:tcp:processing set_peer cmd for interface en0 [nexus10.nlroc:31369] [[56810,0],0] oob:base:send to target [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] oob:base:send known transport for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:send_nb to peer [[56810,0],1]:1 [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb to peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:508] post send to [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:442] processing send to peer [[56810,0],1]:1 [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb: already connected to [[56810,0],1] - queueing for send [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:469] queue send to [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler called to send to peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler SENDING TO [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] MESSAGE SEND COMPLETE TO [[56810,0],1] OF 105 BYTES ON SOCKET 12 [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate new recv msg [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler read hdr [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate data region of size 105 [nexus10.nlroc:31369] [[56810,0],0]: set_addr to uri 3723100160.0;tcp:// 172.16.1.10:50908 [nexus10.nlroc:31369] [[56810,0],0]:set_addr peer [[56810,0],0] is me [nexus10.nlroc:31369] [[56810,0],0]: set_addr to uri 3723100160.1;tcp:// 172.16.1.17,169.254.210.43:54613 [nexus10.nlroc:31369] [[56810,0],0]:set_addr checking if peer [[56810,0],1] is reachable via component tcp [nexus10.nlroc:31369] [[56810,0],0] oob:tcp: working peer [[56810,0],1] address tcp://172.16.1.17,169.254.210.43:54613 [nexus10.nlroc:31369] [[56810,0],0] PEER [[56810,0],1] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus10.nlroc:31369] [[56810,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus10.nlroc:31369] [[56810,0],0]:tcp set addr for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS 169.254.210.43 [nexus10.nlroc:31369] [[56810,0],0]: peer [[56810,0],1] is reachable via component tcp [nexus10.nlroc:31369] [[56810,0],0] OOB_SEND: rml_oob_send.c:199 [nexus10.nlroc:31369] [[56810,0],0]:tcp:processing set_peer cmd for interface en0 [nexus10.nlroc:31369] [[56810,0],0] oob:base:send to target [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] oob:base:send known transport for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] oob:tcp:send_nb to peer [[56810,0],1]:15 [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb to peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:508] post send to [[56810,0],1] [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED [nexus17.nlroc:60584] [[56810,0],1] RECVD COMPLETE MESSAGE FROM [[56810,0],0] (ORIGIN [[56810,0],0]) OF 105 BYTES FOR DEST [[56810,0],1] TAG 1 [nexus17.nlroc:60584] [[56810,0],1] DELIVERING TO RML [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:442] processing send to peer [[56810,0],1]:15 [nexus10.nlroc:31369] [[56810,0],0] tcp:send_nb: already connected to [[56810,0],1] - queueing for send [nexus10.nlroc:31369] [[56810,0],0]:[oob_tcp.c:469] queue send to [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler called to send to peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] tcp:send_handler SENDING TO [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] MESSAGE SEND COMPLETE TO [[56810,0],1] OF 885 BYTES ON SOCKET 12 [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate new recv msg [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler read hdr [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler allocate data region of size 885 [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler called for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:tcp:recv:handler CONNECTED [nexus17.nlroc:60584] [[56810,0],1] RECVD COMPLETE MESSAGE FROM [[56810,0],0] (ORIGIN [[56810,0],0]) OF 885 BYTES FOR DEST [[56810,0],1] TAG 15 [nexus17.nlroc:60584] [[56810,0],1] DELIVERING TO RML [nexus17.nlroc:60584] [[56810,0],1]: set_addr to uri 3723100160.0;tcp:// 172.16.1.10:50908 [nexus17.nlroc:60584] [[56810,0],1]:set_addr checking if peer [[56810,0],0] is reachable via component tcp [nexus17.nlroc:60584] [[56810,0],1] oob:tcp: working peer [[56810,0],0] address tcp://172.16.1.10:50908 [nexus17.nlroc:60584] [[56810,0],1] PEER [[56810,0],0] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus17.nlroc:60584] [[56810,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus17.nlroc:60584] [[56810,0],1]:tcp set addr for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]: peer [[56810,0],0] is reachable via component tcp [nexus17.nlroc:60584] [[56810,0],1]: set_addr to uri 3723100160.1;tcp:// 172.16.1.17,169.254.210.43:54613 [nexus17.nlroc:60584] [[56810,0],1]:set_addr peer [[56810,0],1] is me [nexus17.nlroc:60584] [[56810,0],1]:tcp:processing set_peer cmd for interface en0 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:5 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:5 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 54 [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 54 BYTES ON SOCKET 9 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 54 BYTES FOR DEST [[56810,0],0] TAG 5 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus10.nlroc:31369] [[56810,0],0] plm:base:receive update proc state command from [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] plm:base:receive got update_proc_state for job [56810,1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 183 [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 183 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 118 BYTES ON SOCKET 9 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 183 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 118 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 118 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60585] mca: base: components_register: registering oob components [nexus17.nlroc:60585] mca: base: components_register: found loaded component tcp [nexus17.nlroc:60585] mca: base: components_register: component tcp register function successful [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 294 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 199 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 203 BYTES ON SOCKET 9 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 294 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 294 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 199 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 199 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60585] mca: base: components_register: registering plm components [nexus17.nlroc:60585] mca: base: components_register: found loaded component isolated [nexus17.nlroc:60585] mca: base: components_register: component isolated has no register or open function [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 203 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 203 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60585] mca: base: components_register: found loaded component rsh [nexus17.nlroc:60585] mca: base: components_register: component rsh register function successful [nexus17.nlroc:60585] mca: base: components_register: found loaded component slurm [nexus17.nlroc:60585] mca: base: components_register: component slurm register function successful [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 92 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 92 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 395 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 92 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 395 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 395 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 572 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 1009 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 Package: Open MPI XXX@nexus10.nlroc Distribution [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 572 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 773 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 572 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 1009 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 558 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: a[nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 1009 BYTES FOR DEST [[56810,0],0] TAG 2 lready connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 484 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 747 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 773 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 773 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 591 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:[nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler called for peer [[56810,0],1] 60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb: already connected to [[56810,0],0] - queueing for send [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:469] queue send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler called to send to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] tcp:send_handler SENDING TO [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] MESSAGE SEND COMPLETE TO [[56810,0],0] OF 635 BYTES ON SOCKET 9 [nexus17.nlroc:60584] [[56810,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60584] [[56810,0],1] oob:base:send to target [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:base:send known transport for peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1] oob:tcp:send_nb to peer [[56810,0],0]:2 [nexus17.nlroc:60584] [[56810,0],1] tcp:send_nb to peer [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:508] post send to [[56810,0],0] [nexus17.nlroc:60584] [[56810,0],1]:[oob_tcp.c:442] processing send to peer [[56810,0],0]:2 [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31369] [[56810,0],0]:tcp:recv:handler allocate data region of size 558 [nexus10.nlroc:31369] [[56810,0],0] RECVD COMPLETE MESSAGE FROM [[56810,0],1] (ORIGIN [[56810,0],1]) OF 558 BYTES FOR DEST [[56810,0],0] TAG 2 [nexus10.nlroc:31369] [[56810,0],0] DELIVERING TO RML Open MPI: 1.8.1 Open MPI repo revision: r31483 Open MPI release date: Apr 22, 2014 … it continue and fully finish Case 3 is runs /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca oob_base_verbose 10 -mca plm_base_verbose 10 --debug-daemons -host nexus17 ompi_info [nexus10.nlroc:31479] mca: base: components_register: registering plm components [nexus10.nlroc:31479] mca: base: components_register: found loaded component isolated [nexus10.nlroc:31479] mca: base: components_register: component isolated has no register or open function [nexus10.nlroc:31479] mca: base: components_register: found loaded component rsh [nexus10.nlroc:31479] mca: base: components_register: component rsh register function successful [nexus10.nlroc:31479] mca: base: components_register: found loaded component slurm [nexus10.nlroc:31479] mca: base: components_register: component slurm register function successful [nexus10.nlroc:31479] mca: base: components_open: opening plm components [nexus10.nlroc:31479] mca: base: components_open: found loaded component isolated [nexus10.nlroc:31479] mca: base: components_open: component isolated open function successful [nexus10.nlroc:31479] mca: base: components_open: found loaded component rsh [nexus10.nlroc:31479] mca: base: components_open: component rsh open function successful [nexus10.nlroc:31479] mca: base: components_open: found loaded component slurm [nexus10.nlroc:31479] mca: base: components_open: component slurm open function successful [nexus10.nlroc:31479] mca:base:select: Auto-selecting plm components [nexus10.nlroc:31479] mca:base:select:( plm) Querying component [isolated] [nexus10.nlroc:31479] mca:base:select:( plm) Query of component [isolated] set priority to 0 [nexus10.nlroc:31479] mca:base:select:( plm) Querying component [rsh] [nexus10.nlroc:31479] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus10.nlroc:31479] mca:base:select:( plm) Querying component [slurm] [nexus10.nlroc:31479] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [nexus10.nlroc:31479] mca:base:select:( plm) Selected component [rsh] [nexus10.nlroc:31479] mca: base: close: component isolated closed [nexus10.nlroc:31479] mca: base: close: unloading component isolated [nexus10.nlroc:31479] mca: base: close: component slurm closed [nexus10.nlroc:31479] mca: base: close: unloading component slurm [nexus10.nlroc:31479] mca: base: components_register: registering oob components [nexus10.nlroc:31479] mca: base: components_register: found loaded component tcp [nexus10.nlroc:31479] mca: base: components_register: component tcp register function successful [nexus10.nlroc:31479] mca: base: components_open: opening oob components [nexus10.nlroc:31479] mca: base: components_open: found loaded component tcp [nexus10.nlroc:31479] mca: base: components_open: component tcp open function successful [nexus10.nlroc:31479] mca:oob:select: checking available component tcp [nexus10.nlroc:31479] mca:oob:select: Querying component [tcp] [nexus10.nlroc:31479] oob:tcp: component_available called [nexus10.nlroc:31479] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [nexus10.nlroc:31479] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [nexus10.nlroc:31479] [[56724,0],0] oob:tcp:init creating module for V4 address on interface en0 [nexus10.nlroc:31479] [[56724,0],0] oob:tcp:init adding 172.16.1.10 to our list of V4 connections [nexus10.nlroc:31479] [[56724,0],0] TCP STARTUP [nexus10.nlroc:31479] [[56724,0],0] attempting to bind to IPv4 port 0 [nexus10.nlroc:31479] [[56724,0],0] assigned IPv4 port 50923 [nexus10.nlroc:31479] mca:oob:select: Adding component to end [nexus10.nlroc:31479] mca:oob:select: Found 1 active transports Daemon was launched on nexus17.nlroc - beginning to initialize [nexus17.nlroc:60663] mca: base: components_register: registering plm components [nexus17.nlroc:60663] mca: base: components_register: found loaded component rsh [nexus17.nlroc:60663] mca: base: components_register: component rsh register function successful [nexus17.nlroc:60663] mca: base: components_open: opening plm components [nexus17.nlroc:60663] mca: base: components_open: found loaded component rsh [nexus17.nlroc:60663] mca: base: components_open: component rsh open function successful [nexus17.nlroc:60663] mca:base:select: Auto-selecting plm components [nexus17.nlroc:60663] mca:base:select:( plm) Querying component [rsh] [nexus17.nlroc:60663] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus17.nlroc:60663] mca:base:select:( plm) Selected component [rsh] [nexus17.nlroc:60663] mca: base: components_register: registering oob components [nexus17.nlroc:60663] mca: base: components_register: found loaded component tcp [nexus17.nlroc:60663] mca: base: components_register: component tcp register function successful [nexus17.nlroc:60663] mca: base: components_open: opening oob components [nexus17.nlroc:60663] mca: base: components_open: found loaded component tcp [nexus17.nlroc:60663] mca: base: components_open: component tcp open function successful [nexus17.nlroc:60663] mca:oob:select: checking available component tcp [nexus17.nlroc:60663] mca:oob:select: Querying component [tcp] [nexus17.nlroc:60663] oob:tcp: component_available called [nexus17.nlroc:60663] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [nexus17.nlroc:60663] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init creating module for V4 address on interface en0 [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init adding 172.16.1.17 to our list of V4 connections [nexus17.nlroc:60663] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4 [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init creating module for V4 address on interface en2 [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:init adding 169.254.210.43 to our list of V4 connections [nexus17.nlroc:60663] [[56724,0],1] TCP STARTUP [nexus17.nlroc:60663] [[56724,0],1] attempting to bind to IPv4 port 0 [nexus17.nlroc:60663] [[56724,0],1] assigned IPv4 port 54631 [nexus17.nlroc:60663] mca:oob:select: Adding component to end [nexus17.nlroc:60663] mca:oob:select: Found 1 active transports Daemon [[56724,0],1] checking in as pid 60663 on host nexus17 [nexus17.nlroc:60663] [[56724,0],1] orted: up and running - waiting for commands! [nexus17.nlroc:60663] [[56724,0],1]: set_addr to uri 3717464064.0;tcp:// 172.16.1.10:50923 [nexus17.nlroc:60663] [[56724,0],1]:set_addr checking if peer [[56724,0],0] is reachable via component tcp [nexus17.nlroc:60663] [[56724,0],1] oob:tcp: working peer [[56724,0],0] address tcp://172.16.1.10:50923 [nexus17.nlroc:60663] [[56724,0],1] PEER [[56724,0],0] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus17.nlroc:60663] [[56724,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus17.nlroc:60663] [[56724,0],1]:tcp set addr for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]: peer [[56724,0],0] is reachable via component tcp [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1]:tcp:processing set_peer cmd for interface en0 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:10 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:10 [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:476] queue pending to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: initiating connection to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:490] connect to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[56724,0],0] via interface en0 [nexus17.nlroc:60663] [[56724,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[56724,0],0] via interface en0 on socket 9 [nexus17.nlroc:60663] [[56724,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[56724,0],0] on 172.16.1.10:50923 - 0 retries [nexus17.nlroc:60663] [[56724,0],1] waiting for connect completion to [[56724,0],0] - activating send event [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler CONNECTING [nexus17.nlroc:60663] [[56724,0],1]:tcp:complete_connect called for peer [[56724,0],0] on socket 9 [nexus10.nlroc:31479] [[56724,0],0] mca_oob_tcp_listen_thread: new connection: (12, 0) 172.16.1.17:54632 [nexus17.nlroc:60663] [[56724,0],1] tcp_peer_complete_connect: sending ack to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] SEND CONNECT ACK [nexus17.nlroc:60663] [[56724,0],1] send blocking of 40 bytes to socket 9 [nexus17.nlroc:60663] [[56724,0],1] connect-ack sent to socket 9 [nexus17.nlroc:60663] [[56724,0],1] tcp_peer_complete_connect: setting read event on connection to [[56724,0],0] [nexus10.nlroc:31479] [[56724,0],0] connection_handler: working connection (12, 35) 172.16.1.17:54632 [nexus10.nlroc:31479] [[56724,0],0] accept_connection: 172.16.1.17:54632 [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called [nexus10.nlroc:31479] [[56724,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 12 [nexus10.nlroc:31479] [[56724,0],0] waiting for connect ack from UNKNOWN [nexus10.nlroc:31479] [[56724,0],0] connect ack received from UNKNOWN [nexus10.nlroc:31479] [[56724,0],0] connect-ack recvd from UNKNOWN [nexus10.nlroc:31479] [[56724,0],0] mca_oob_tcp_recv_connect: connection from new peer [nexus10.nlroc:31479] [[56724,0],0] connect-ack header from [[56724,0],1] is okay [nexus10.nlroc:31479] [[56724,0],0] waiting for connect ack from [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] connect ack received from [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] connect-ack version from [[56724,0],1] matches ours [nexus10.nlroc:31479] [[56724,0],0] connect-ack [[56724,0],1] authenticated [nexus10.nlroc:31479] [[56724,0],0] tcp:peer_accept called for peer [[56724,0],1] in state UNKNOWN on socket 12 [nexus10.nlroc:31479] [[56724,0],0] SEND CONNECT ACK [nexus10.nlroc:31479] [[56724,0],0] send blocking of 40 bytes to socket 12 [nexus10.nlroc:31479] [[56724,0],0] connect-ack sent to socket 12 [nexus10.nlroc:31479] [[56724,0],0]-[[56724,0],1] tcp_peer_connected on socket 12 [nexus10.nlroc:31479] [[56724,0],0]-[[56724,0],1] accepted: 172.16.1.10 - 172.16.1.17 nodelay 0 sndbuf 131072 rcvbuf 131072 flags 00000006 [nexus10.nlroc:31479] [[56724,0],0] tcp:set_module called for peer [[56724,0],1] [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler called for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] RECV CONNECT ACK FROM [[56724,0],0] ON SOCKET 9 [nexus17.nlroc:60663] [[56724,0],1] waiting for connect ack from [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] connect ack received from [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] connect-ack recvd from [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] connect-ack header from [[56724,0],0] is okay [nexus17.nlroc:60663] [[56724,0],1] waiting for connect ack from [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] connect ack received from [[56724,0],0] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 9699 [nexus17.nlroc:60663] [[56724,0],1] connect-ack version from [[56724,0],0] matches ours [nexus17.nlroc:60663] [[56724,0],1] connect-ack [[56724,0],0] authenticated [nexus17.nlroc:60663] [[56724,0],1]-[[56724,0],0] tcp_peer_connected on socket 9 [nexus17.nlroc:60663] [[56724,0],1]-[[56724,0],0] connected: 172.16.1.17 - 172.16.1.10 nodelay 0 sndbuf 131768 rcvbuf 131768 flags 00000006 [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler starting send/recv events [nexus17.nlroc:60663] [[56724,0],1] tcp:set_module called for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 9699 BYTES ON SOCKET 9 [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 9699 BYTES FOR DEST [[56724,0],0] TAG 10 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus10.nlroc:31479] [[56724,0],0]: set_addr to uri 3717464064.1;tcp:// 172.16.1.17,169.254.210.43:54631 [nexus10.nlroc:31479] [[56724,0],0]:set_addr checking if peer [[56724,0],1] is reachable via component tcp [nexus10.nlroc:31479] [[56724,0],0] oob:tcp: working peer [[56724,0],1] address tcp://172.16.1.17,169.254.210.43:54631 [nexus10.nlroc:31479] [[56724,0],0] PEER [[56724,0],1] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus10.nlroc:31479] [[56724,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus10.nlroc:31479] [[56724,0],0]:tcp set addr for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS 169.254.210.43 [nexus10.nlroc:31479] [[56724,0],0]: peer [[56724,0],1] is reachable via component tcp [nexus10.nlroc:31479] [[56724,0],0]:tcp:processing set_peer cmd for interface en0 [nexus10.nlroc:31479] [[56724,0],0]: set_addr to uri 3717464064.0;tcp:// 172.16.1.10:50923 [nexus10.nlroc:31479] [[56724,0],0]:set_addr peer [[56724,0],0] is me [nexus10.nlroc:31479] [[56724,0],0]: set_addr to uri 3717464064.1;tcp:// 172.16.1.17,169.254.210.43:54631 [nexus10.nlroc:31479] [[56724,0],0]:set_addr checking if peer [[56724,0],1] is reachable via component tcp [nexus10.nlroc:31479] [[56724,0],0] oob:tcp: working peer [[56724,0],1] address tcp://172.16.1.17,169.254.210.43:54631 [nexus10.nlroc:31479] [[56724,0],0] PEER [[56724,0],1] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus10.nlroc:31479] [[56724,0],0] PASSING ADDR 172.16.1.17 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus10.nlroc:31479] [[56724,0],0]:tcp set addr for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] UNFOUND KERNEL INDEX -13 FOR ADDRESS 169.254.210.43 [nexus10.nlroc:31479] [[56724,0],0]: peer [[56724,0],1] is reachable via component tcp [nexus10.nlroc:31479] [[56724,0],0] OOB_SEND: rml_oob_send.c:199 [nexus10.nlroc:31479] [[56724,0],0]:tcp:processing set_peer cmd for interface en0 [nexus10.nlroc:31479] [[56724,0],0] oob:base:send to target [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] oob:base:send known transport for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] oob:tcp:send_nb to peer [[56724,0],1]:15 [nexus10.nlroc:31479] [[56724,0],0] tcp:send_nb to peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:[oob_tcp.c:508] post send to [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] orted_cmd: received add_local_procs [nexus10.nlroc:31479] [[56724,0],0]:[oob_tcp.c:442] processing send to peer [[56724,0],1]:15 [nexus10.nlroc:31479] [[56724,0],0] tcp:send_nb: already connected to [[56724,0],1] - queueing for send [nexus10.nlroc:31479] [[56724,0],0]:[oob_tcp.c:469] queue send to [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] tcp:send_handler called to send to peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] tcp:send_handler SENDING TO [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] MESSAGE SEND COMPLETE TO [[56724,0],1] OF 956 BYTES ON SOCKET 12 [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler called for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler CONNECTED [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler allocate new recv msg [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler read hdr [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler allocate data region of size 956 [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler called for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:tcp:recv:handler CONNECTED [nexus17.nlroc:60663] [[56724,0],1] RECVD COMPLETE MESSAGE FROM [[56724,0],0] (ORIGIN [[56724,0],0]) OF 956 BYTES FOR DEST [[56724,0],1] TAG 15 [nexus17.nlroc:60663] [[56724,0],1] DELIVERING TO RML [nexus17.nlroc:60663] [[56724,0],1]: set_addr to uri 3717464064.0;tcp:// 172.16.1.10:50923 [nexus17.nlroc:60663] [[56724,0],1]:set_addr checking if peer [[56724,0],0] is reachable via component tcp [nexus17.nlroc:60663] [[56724,0],1] oob:tcp: working peer [[56724,0],0] address tcp://172.16.1.10:50923 [nexus17.nlroc:60663] [[56724,0],1] PEER [[56724,0],0] MAY BE REACHABLE USING MODULE AT KINDEX 2 INTERFACE en0 [nexus17.nlroc:60663] [[56724,0],1] PASSING ADDR 172.16.1.10 TO INTERFACE en0 AT KERNEL INDEX 2 [nexus17.nlroc:60663] [[56724,0],1]:tcp set addr for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]: peer [[56724,0],0] is reachable via component tcp [nexus17.nlroc:60663] [[56724,0],1]: set_addr to uri 3717464064.1;tcp:// 172.16.1.17,169.254.210.43:54631 [nexus17.nlroc:60663] [[56724,0],1]:set_addr peer [[56724,0],1] is me [nexus17.nlroc:60663] [[56724,0],1]:tcp:processing set_peer cmd for interface en0 [nexus17.nlroc:60663] [[56724,0],1] orted_cmd: received add_local_procs [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:5 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 54 [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 54 BYTES FOR DEST [[56724,0],0] TAG 5 [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:5 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to [[56724,0],0] - queueing for send [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 54 BYTES ON SOCKET 9 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus10.nlroc:31479] [[56724,0],0] plm:base:receive update proc state command from [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0] plm:base:receive got update_proc_state for job [56724,1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 183 [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 183 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to [[56724,0],0] - queueing for send [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 183 BYTES ON SOCKET 9 [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 [nexus17.nlroc:[nexus17.nlroc:60664] mca: base: components_register: registering oob components [nexus17.nlroc:60664] mca: base: components_register: found loaded component tcp 60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to [[56724,0],0] - queueing for send [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 118 BYTES ON SOCKET 9 [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 118 [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 118 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus17.nlroc:60664] mca: base: components_register: component tcp register function successful [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 294 [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to [[56724,0],0] - queueing for send [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 294 BYTES ON SOCKET 9 [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to [[56724,0],0] - queueing for send [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 282 BYTES ON SOCKET 9 [nexus17.nlroc:60663] [[56724,0],1] OOB_SEND: rml_oob_send.c:199 [nexus17.nlroc:60663] [[56724,0],1] oob:base:send to target [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:base:send known transport for peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] oob:tcp:send_nb to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:508] post send to [[56724,0],0] [nexus17.nlroc:60663] [[56724,[nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] 0],1]:[oob_tcp.c:442] processing send to peer [[56724,0],0]:2 [nexus17.nlroc:60663] [[56724,0],1] tcp:send_nb: already connected to [[56724,0],0] - queueing for send [nexus17.nlroc:60663] [[56724,0],1]:[oob_tcp.c:469] queue send to [[56724,0],0] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 294 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler called to send to peer [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] tcp:send_handler SENDING TO [[56724,0],0] [nexus17.nlroc:60663] [[56724,0],1] MESSAGE SEND COMPLETE TO [[56724,0],0] OF 120 BYTES ON SOCKET 9 [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 282 [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 282 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 120 [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 120 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus17.nlroc:60664] mca: base: components_register: registering plm components [nexus17.nlroc:60664] mca: base: components_register: found loaded component isolated [nexus17.nlroc:60664] mca: base: components_register: component isolated has no register or open function [nexus17.nlroc:60664] mca: base: components_register: found loaded component rsh [nexus17.nlroc:60664] mca: base: components_register: component rsh register function successful [nexus17.nlroc:60664] mca: base: components_register: found loaded component slurm [nexus17.nlroc:60664] mca: base: components_register: component slurm register function successful [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 92 [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 92 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler called for peer [[56724,0],1] [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler CONNECTED [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate new recv msg [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler read hdr [nexus10.nlroc:31479] [[56724,0],0]:tcp:recv:handler allocate data region of size 560 [nexus10.nlroc:31479] [[56724,0],0] RECVD COMPLETE MESSAGE FROM [[56724,0],1] (ORIGIN [[56724,0],1]) OF 560 BYTES FOR DEST [[56724,0],0] TAG 2 [nexus10.nlroc:31479] [[56724,0],0] DELIVERING TO RML Package: Open MPI XXXX@nexus10.nlroc Distribution … and continue until the end On Tue, Jul 15, 2014 at 2:58 PM, Ralph Castain <r...@open-mpi.org> wrote: > I'm afraid I don't understand your comment about "another mpi process". > Looking at your output, it would appear that there is something going on > with host nexus17. In both cases, mpirun is launching a single daemon onto > only one other node - the only difference was in the node being used. The > "no_tree_spawn" flag did nothing as that only applies when there are > multiple nodes being used. > > I would check to see if there is a firewall between nexus10 and nexus17. > You can also add -mca oob_base_verbose 10 to your cmd line and see if the > daemon on nexus17 is able to connect back to mpirun., and add > --debug-daemons to see any error messages that daemon may be trying to > report. > > > On Jul 15, 2014, at 3:08 AM, Ricardo Fernández-Perea < > rfernandezpe...@gmail.com> wrote: > > I have try if another mpi process is running in the node already the > process run > > $ricardo$ /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca > plm_base_verbose 10 -host nexus16 ompi_info > [nexus10.nlroc:27397] mca: base: components_register: registering plm > components > [nexus10.nlroc:27397] mca: base: components_register: found loaded > component isolated > [nexus10.nlroc:27397] mca: base: components_register: component isolated > has no register or open function > [nexus10.nlroc:27397] mca: base: components_register: found loaded > component rsh > [nexus10.nlroc:27397] mca: base: components_register: component rsh > register function successful > [nexus10.nlroc:27397] mca: base: components_register: found loaded > component slurm > [nexus10.nlroc:27397] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:27397] mca: base: components_open: opening plm components > [nexus10.nlroc:27397] mca: base: components_open: found loaded component > isolated > [nexus10.nlroc:27397] mca: base: components_open: component isolated open > function successful > [nexus10.nlroc:27397] mca: base: components_open: found loaded component > rsh > [nexus10.nlroc:27397] mca: base: components_open: component rsh open > function successful > [nexus10.nlroc:27397] mca: base: components_open: found loaded component > slurm > [nexus10.nlroc:27397] mca: base: components_open: component slurm open > function successful > [nexus10.nlroc:27397] mca:base:select: Auto-selecting plm components > [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [isolated] > [nexus10.nlroc:27397] mca:base:select:( plm) Query of component > [isolated] set priority to 0 > [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [rsh] > [nexus10.nlroc:27397] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [slurm] > [nexus10.nlroc:27397] mca:base:select:( plm) Skipping component [slurm]. > Query failed to return a module > [nexus10.nlroc:27397] mca:base:select:( plm) Selected component [rsh] > [nexus10.nlroc:27397] mca: base: close: component isolated closed > [nexus10.nlroc:27397] mca: base: close: unloading component isolated > [nexus10.nlroc:27397] mca: base: close: component slurm closed > [nexus10.nlroc:27397] mca: base: close: unloading component slurm > [nexus10.nlroc:27397] [[52326,0],0] plm:base:receive update proc state > command from [[52326,0],1] > [nexus10.nlroc:27397] [[52326,0],0] plm:base:receive got update_proc_state > for job [52326,1] > [nexus16.nlroc:59687] mca: base: components_register: registering plm > components > [nexus16.nlroc:59687] mca: base: components_register: found loaded > component isolated > [nexus16.nlroc:59687] mca: base: components_register: component isolated > has no register or open function > [nexus16.nlroc:59687] mca: base: components_register: found loaded > component rsh > [nexus16.nlroc:59687] mca: base: components_register: component rsh > register function successful > [nexus16.nlroc:59687] mca: base: components_register: found loaded > component slurm > [nexus16.nlroc:59687] mca: base: components_register: component slurm > register function successful > Package: Open MPI XXXX@nexus10.nlroc Distribution > Open MPI: 1.8.1 > Open MPI repo revision: r31483 > Open MPI release date: Apr 22, 2014 > Open RTE: 1.8.1 > … > > but if the compute node has not a mpi process running in it it already > hangs as > > /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca > plm_base_verbose 10 -host nexus17 ompi_info > [nexus10.nlroc:27438] mca: base: components_register: registering plm > components > [nexus10.nlroc:27438] mca: base: components_register: found loaded > component isolated > [nexus10.nlroc:27438] mca: base: components_register: component isolated > has no register or open function > [nexus10.nlroc:27438] mca: base: components_register: found loaded > component rsh > [nexus10.nlroc:27438] mca: base: components_register: component rsh > register function successful > [nexus10.nlroc:27438] mca: base: components_register: found loaded > component slurm > [nexus10.nlroc:27438] mca: base: components_register: component slurm > register function successful > [nexus10.nlroc:27438] mca: base: components_open: opening plm components > [nexus10.nlroc:27438] mca: base: components_open: found loaded component > isolated > [nexus10.nlroc:27438] mca: base: components_open: component isolated open > function successful > [nexus10.nlroc:27438] mca: base: components_open: found loaded component > rsh > [nexus10.nlroc:27438] mca: base: components_open: component rsh open > function successful > [nexus10.nlroc:27438] mca: base: components_open: found loaded component > slurm > [nexus10.nlroc:27438] mca: base: components_open: component slurm open > function successful > [nexus10.nlroc:27438] mca:base:select: Auto-selecting plm components > [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [isolated] > [nexus10.nlroc:27438] mca:base:select:( plm) Query of component > [isolated] set priority to 0 > [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [rsh] > [nexus10.nlroc:27438] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [slurm] > [nexus10.nlroc:27438] mca:base:select:( plm) Skipping component [slurm]. > Query failed to return a module > [nexus10.nlroc:27438] mca:base:select:( plm) Selected component [rsh] > [nexus10.nlroc:27438] mca: base: close: component isolated closed > [nexus10.nlroc:27438] mca: base: close: unloading component isolated > [nexus10.nlroc:27438] mca: base: close: component slurm closed > [nexus10.nlroc:27438] mca: base: close: unloading component slurm > > and it stop there > > > > > On Mon, Jul 14, 2014 at 8:56 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Hmmm...no, it worked just fine for me. It sounds like something else is >> going on. >> >> Try configuring OMPI with --enable-debug, and then add -mca >> plm_base_verbose 10 to get a better sense of what is going on. >> >> >> On Jul 14, 2014, at 10:27 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >> I confess I haven't tested no_tree_spawn in ages, so it is quite possible >> it has suffered bit rot. I can try to take a look at it in a bit >> >> >> On Jul 14, 2014, at 10:13 AM, Ricardo Fernández-Perea < >> rfernandezpe...@gmail.com> wrote: >> >> Thank you for the fast answer >> >> While that resolve my problem with cross ssh authentication a command as >> >> /opt/openmpi/bin/mpirun --mca mtl mx --mca pml cm --mca >> plm_rsh_no_tree_spawn 1 -hostfile hostfile ompi_info >> >> just hung with no output and although there is a ssh connexion no orte >> program is initiated in the destination nodes >> >> and while >> >> /opt/openmpi/bin/mpirun -host host18 ompi_info >> >> works >> >> /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -host host18 >> ompi_info >> >> hangs, is there some condition in the use of this parameter. >> >> Yours truly >> >> Ricardo >> >> >> >> On Mon, Jul 14, 2014 at 6:35 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> During the 1.7 series and for all follow-on series, OMPI changed to a >>> mode where it launches a daemon on all allocated nodes at the startup of >>> mpirun. This allows us to determine the hardware topology of the nodes and >>> take that into account when mapping. You can override that behavior by >>> either adding --novm to your cmd line (which will impact your >>> mapping/binding options), or by specifying the hosts to use by editing your >>> hostfile, or adding --host host1,host2 to your cmd line >>> >>> The rsh launcher defaults to a tree-based pattern, thus requiring that >>> we be able to ssh from one compute node to another. You can change that to >>> a less scalable direct mode by adding >>> >>> --mca plm_rsh_no_tree_spawn 1 >>> >>> to the cmd line >>> >>> >>> On Jul 14, 2014, at 9:21 AM, Ricardo Fernández-Perea < >>> rfernandezpe...@gmail.com> wrote: >>> >>> > I'm trying to update to openMPI 1.8.1 thru ssh and Myrinet >>> > >>> > running a command as >>> > >>> > /opt/openmpi/bin/mpirun --verbose --mca mtl mx --mca pml cm -hostfile >>> hostfile -np 16 >>> > >>> > when the hostfile contain only two nodes as >>> > >>> > host1 slots=8 max-slots=8 >>> > host2 slots=8 max-slots=8 >>> > >>> > it runs perfectly but when the hostfile has a third node as >>> > >>> > >>> > host1 slots=8 max-slots=8 >>> > host2 slots=8 max-slots=8 >>> > host3 slots=8 max-slots=8 >>> > >>> > it try to establish an ssh connection between the running hosts1 and >>> host3 that should not run any process that fails hanging the process >>> without signaling. >>> > >>> > >>> > my ompi_info is as follow >>> > >>> > Package: Open MPI XXX Distribution >>> > Open MPI: 1.8.1 >>> > Open MPI repo revision: r31483 >>> > Open MPI release date: Apr 22, 2014 >>> > Open RTE: 1.8.1 >>> > Open RTE repo revision: r31483 >>> > Open RTE release date: Apr 22, 2014 >>> > OPAL: 1.8.1 >>> > OPAL repo revision: r31483 >>> > OPAL release date: Apr 22, 2014 >>> > MPI API: 3.0 >>> > Ident string: 1.8.1 >>> > Prefix: /opt/openmpi >>> > Configured architecture: x86_64-apple-darwin9.8.0 >>> > Configure host: XXXX >>> > Configured by: XXXX >>> > Configured on: Thu Jun 12 10:37:33 CEST 2014 >>> > Configure host: XXXX >>> > Built by: XXXX >>> > Built on: Thu Jun 12 11:13:16 CEST 2014 >>> > Built host: XXXX >>> > C bindings: yes >>> > C++ bindings: yes >>> > Fort mpif.h: yes (single underscore) >>> > Fort use mpi: yes (full: ignore TKR) >>> > Fort use mpi size: deprecated-ompi-info-value >>> > Fort use mpi_f08: yes >>> > Fort mpi_f08 compliance: The mpi_f08 module is available, but due to >>> > limitations in the ifort compiler, does not >>> support >>> > the following: array subsections, direct >>> passthru >>> > (where possible) to underlying Open MPI's C >>> > functionality >>> > Fort mpi_f08 subarrays: no >>> > Java bindings: no >>> > Wrapper compiler rpath: unnecessary >>> > C compiler: icc >>> > C compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icc >>> > C compiler family name: INTEL >>> > C compiler version: 1110.20091130 >>> > C++ compiler: icpc >>> > C++ compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icpc >>> > Fort compiler: ifort >>> > Fort compiler abs: >>> /opt/intel/Compiler/11.1/080/bin/intel64/ifort >>> > Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::) >>> > Fort 08 assumed shape: no >>> > Fort optional args: yes >>> > Fort BIND(C) (all): yes >>> > Fort ISO_C_BINDING: yes >>> > Fort SUBROUTINE BIND(C): yes >>> > Fort TYPE,BIND(C): yes >>> > Fort T,BIND(C,name="a"): yes >>> > Fort PRIVATE: yes >>> > Fort PROTECTED: yes >>> > Fort ABSTRACT: yes >>> > Fort ASYNCHRONOUS: yes >>> > Fort PROCEDURE: yes >>> > Fort f08 using wrappers: yes >>> > C profiling: yes >>> > C++ profiling: yes >>> > Fort mpif.h profiling: yes >>> > Fort use mpi profiling: yes >>> > Fort use mpi_f08 prof: yes >>> > C++ exceptions: no >>> > Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL >>> support: yes, >>> > OMPI progress: no, ORTE progress: yes, Event >>> lib: >>> > yes) >>> > Sparse Groups: no >>> > Internal debug support: no >>> > MPI interface warnings: yes >>> > MPI parameter check: runtime >>> > Memory profiling support: no >>> > Memory debugging support: no >>> > libltdl support: yes >>> > Heterogeneous support: no >>> > mpirun default --prefix: no >>> > MPI I/O support: yes >>> > MPI_WTIME support: gettimeofday >>> > Symbol vis. support: yes >>> > Host topology support: yes >>> > MPI extensions: >>> > FT Checkpoint support: no (checkpoint thread: no) >>> > C/R Enabled Debugging: no >>> > VampirTrace support: yes >>> > MPI_MAX_PROCESSOR_NAME: 256 >>> > MPI_MAX_ERROR_STRING: 256 >>> > MPI_MAX_OBJECT_NAME: 64 >>> > MPI_MAX_INFO_KEY: 36 >>> > MPI_MAX_INFO_VAL: 256 >>> > MPI_MAX_PORT_NAME: 1024 >>> > MPI_MAX_DATAREP_STRING: 128 >>> > >>> > >>> > _______________________________________________ >>> > users mailing list >>> > us...@open-mpi.org >>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/07/24764.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/07/24765.php >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24766.php >> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24768.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24769.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24770.php >