Thank you, i added the parameters and I figured out, that the ip table firewall was messing up something, so I disabled it on both machines. But now I get another error:
[superuser@localhost ~]$ mpirun --host 192.168.54.56 --leave-session-attached -mca plm_base_verbose 5 -mca oob_base_verbose 5 hostname clear [localhost.localdomain:10884] mca:base:select:( plm) Querying component [isolated] [localhost.localdomain:10884] mca:base:select:( plm) Query of component [isolated] set priority to 0 [localhost.localdomain:10884] mca:base:select:( plm) Querying component [rsh] [localhost.localdomain:10884] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL [localhost.localdomain:10884] mca:base:select:( plm) Query of component [rsh] set priority to 10 [localhost.localdomain:10884] mca:base:select:( plm) Querying component [slurm] [localhost.localdomain:10884] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [localhost.localdomain:10884] mca:base:select:( plm) Selected component [rsh] [localhost.localdomain:10884] plm:base:set_hnp_name: initial bias 10884 nodename hash 724106151 [localhost.localdomain:10884] plm:base:set_hnp_name: final jobfam 64011 [localhost.localdomain:10884] mca:oob:select: checking available component tcp [localhost.localdomain:10884] mca:oob:select: Querying component [tcp] [localhost.localdomain:10884] oob:tcp: component_available called [localhost.localdomain:10884] [[64011,0],0] creating OOB-TCP module for interface eth0 [localhost.localdomain:10884] [[64011,0],0] creating OOB-TCP module for interface virbr0 [localhost.localdomain:10884] [[64011,0],0] TCP STARTUP [localhost.localdomain:10884] [[64011,0],0] attempting to bind to IPv4 port 0 [localhost.localdomain:10884] mca:oob:select: Adding component to end [localhost.localdomain:10884] mca:oob:select: Found 1 active transports [localhost.localdomain:10884] [[64011,0],0] plm:rsh_setup on agent ssh : rsh path NULL [localhost.localdomain:10884] [[64011,0],0] plm:base:receive start comm [localhost.localdomain:10884] [[64011,0],0] plm:base:setup_job [localhost.localdomain:10884] [[64011,0],0] plm:base:setup_vm [localhost.localdomain:10884] [[64011,0],0] plm:base:setup_vm creating map [localhost.localdomain:10884] [[64011,0],0] setup:vm: working unmanaged allocation [localhost.localdomain:10884] [[64011,0],0] using dash_host [localhost.localdomain:10884] [[64011,0],0] checking node 192.168.54.56 [localhost.localdomain:10884] [[64011,0],0] plm:base:setup_vm add new daemon [[64011,0],1] [localhost.localdomain:10884] [[64011,0],0] plm:base:setup_vm assigning new daemon [[64011,0],1] to node 192.168.54.56 [localhost.localdomain:10884] [[64011,0],0] plm:rsh: launching vm [localhost.localdomain:10884] [[64011,0],0] plm:rsh: local shell: 0 (bash) [localhost.localdomain:10884] [[64011,0],0] plm:rsh: assuming same remote shell as local shell [localhost.localdomain:10884] [[64011,0],0] plm:rsh: remote shell: 0 (bash) [localhost.localdomain:10884] [[64011,0],0] plm:rsh: final template argv: /usr/bin/ssh <template> orted -mca ess env -mca orte_ess_jobid 4195024896 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 2 -mca orte_hnp_uri "4195024896.0;tcp://192.168.54.137,192.168.122.1:45032" --tree-spawn -mca plm_base_verbose 5 -mca oob_base_verbose 5 -mca plm rsh -mca orte_leave_session_attached 1 [localhost.localdomain:10884] [[64011,0],0] plm:rsh:launch daemon 0 not a child of mine [localhost.localdomain:10884] [[64011,0],0] plm:rsh: adding node 192.168.54.56 to launch list [localhost.localdomain:10884] [[64011,0],0] plm:rsh: activating launch event [localhost.localdomain:10884] [[64011,0],0] plm:rsh: recording launch of daemon [[64011,0],1] [localhost.localdomain:10884] [[64011,0],0] plm:rsh: executing: (/usr/bin/ssh) [/usr/bin/ssh 192.168.54.56 orted -mca ess env -mca orte_ess_jobid 4195024896 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri "4195024896.0;tcp://192.168.54.137,192.168.122.1:45032" --tree-spawn -mca plm_base_verbose 5 -mca oob_base_verbose 5 -mca plm rsh -mca orte_leave_session_attached 1] [CUDAServer:04970] mca:base:select:( plm) Querying component [rsh] [CUDAServer:04970] mca:base:select:( plm) Query of component [rsh] set priority to 10 [CUDAServer:04970] mca:base:select:( plm) Selected component [rsh] [CUDAServer:04970] mca:oob:select: checking available component tcp [CUDAServer:04970] mca:oob:select: Querying component [tcp] [CUDAServer:04970] oob:tcp: component_available called [CUDAServer:04970] [[64011,0],1] TCP STARTUP [CUDAServer:04970] [[64011,0],1] attempting to bind to IPv4 port 0 [CUDAServer:04970] mca:oob:select: Adding component to end [CUDAServer:04970] mca:oob:select: Found 1 active transports [CUDAServer:04970] [[64011,0],1]: set_addr to uri 4195024896.0;tcp://192.168.54.137,192.168.122.1:45032 [CUDAServer:04970] [[64011,0],1]:set_addr checking if peer [[64011,0],0] is reachable via component tcp [CUDAServer:04970] [[64011,0],1] oob:tcp: working peer [[64011,0],0] address tcp://192.168.54.137,192.168.122.1:45032 [CUDAServer:04970] [[64011,0],1]:tcp set addr for peer [[64011,0],0] [CUDAServer:04970] [[64011,0],1]: peer [[64011,0],0] is reachable via component tcp [CUDAServer:04970] [[64011,0],1] OOB_SEND: rml_oob_send.c:199 [CUDAServer:04970] [[64011,0],1] oob:base:send to target [[64011,0],0] [CUDAServer:04970] [[64011,0],1] oob:tcp:send_nb to peer [[64011,0],0]:10 [CUDAServer:04970] [[64011,0],1] tcp:send_nb to peer [[64011,0],0] [CUDAServer:04970] [[64011,0],1]:[oob_tcp.c:508] post send to [[64011,0],0] [CUDAServer:04970] [[64011,0],1]:[oob_tcp.c:442] processing send to peer [[64011,0],0]:10 [CUDAServer:04970] [[64011,0],1]:[oob_tcp.c:476] queue pending to [[64011,0],0] [CUDAServer:04970] [[64011,0],1] tcp:send_nb: initiating connection to [[64011,0],0] [CUDAServer:04970] [[64011,0],1]:[oob_tcp.c:490] connect to [[64011,0],0] [localhost.localdomain:10884] [[64011,0],0] connection_handler: working connection (12, 0) 192.168.54.56:38362 [CUDAServer:04970] [[64011,0],1] MESSAGE SEND COMPLETE TO [[64011,0],0] OF 12963 BYTES ON SOCKET 9 [localhost.localdomain:10884] [[64011,0],0] ORTE_ERROR_LOG: Data unpack failed in file base/plm_base_launch_support.c at line 964 <==ERROR============================================= [localhost.localdomain:10884] [[64011,0],0] plm:base:orted_cmd sending orted_exit commands [localhost.localdomain:10884] [[64011,0],0] plm:base:receive stop comm [localhost.localdomain:10884] [[64011,0],0] TCP SHUTDOWN [localhost.localdomain:10884] [[64011,0],0] RELEASING PEER OBJ [[64011,0],1] [localhost.localdomain:10884] [[64011,0],0] CLOSING SOCKET 12 [CUDAServer:04970] [[64011,0],1]-[[64011,0],0] mca_oob_tcp_msg_recv: peer closed connection [CUDAServer:04970] [[64011,0],1] TCP SHUTDOWN [CUDAServer:04970] [[64011,0],1] RELEASING PEER OBJ [[64011,0],0] [CUDAServer:04970] [[64011,0],1] CLOSING SOCKET 9 Regards Benjamin Giehle