Hi, > We shouldn't just hang - that isn't right. Can you configure > OMPI with --enable-debug and then add "-mca plm_base_verbose 5 > -mca state_base_verbose 5" to your cmd line so we can see where > it is hanging?
The program doesn't hang. It completes without any output and return status "1". tyr small_prog 55 mpiexec -np 3 -host rs0,sunpc1,linpc1 \ -mca plm_base_verbose 5 -mca state_base_verbose 5 rank_size [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [app] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [app]. Query failed to return a module [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [hnp] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Query of component [hnp] set priority to 60 [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [novm] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [novm]. Query failed to return a module [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [orted] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [orted]. Query failed to return a module [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [staged_hnp] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [staged_orted] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Querying component [tool] [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Skipping component [tool]. Query failed to return a module [tyr.informatik.hs-fulda.de:12297] mca:base:select:(state) Selected component [hnp] [tyr.informatik.hs-fulda.de:12297] mca:base:select:( plm) Querying component [rsh] [tyr.informatik.hs-fulda.de:12297] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL [tyr.informatik.hs-fulda.de:12297] mca:base:select:( plm) Query of component [rsh] set priority to 10 [tyr.informatik.hs-fulda.de:12297] mca:base:select:( plm) Selected component [rsh] [tyr.informatik.hs-fulda.de:12297] plm:base:set_hnp_name: initial bias 12297 nodename hash 339128848 [tyr.informatik.hs-fulda.de:12297] plm:base:set_hnp_name: final jobfam 38447 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh_setup on agent ssh : rsh path NULL [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:receive start comm [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [INVALID] STATE PENDING INIT AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:900 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [INVALID] STATE PENDING INIT PRI 4 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_job [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE INIT_COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:317 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE INIT_COMPLETE PRI 4 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE PENDING ALLOCATION AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:328 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE PENDING ALLOCATION PRI 4 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE ALLOCATION COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/ras/base/ras_base_allocate.c:423 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE ALLOCATION COMPLETE PRI 4 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB [38447,1] STATE PENDING DAEMON LAUNCH AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:184 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB [38447,1] STATE PENDING DAEMON LAUNCH PRI 4 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm creating map [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] setup:vm: working unmanaged allocation [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] using dash_host [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] checking node rs0 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] checking node sunpc1 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] checking node linpc1 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm add new daemon [[38447,0],1] [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm assigning new daemon [[38447,0],1] to node rs0 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm add new daemon [[38447,0],2] [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm assigning new daemon [[38447,0],2] to node sunpc1 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm add new daemon [[38447,0],3] [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:setup_vm assigning new daemon [[38447,0],3] to node linpc1 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: launching vm [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: local shell: 2 (tcsh) [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: assuming same remote shell as local shell [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: remote shell: 2 (tcsh) [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: final template argv: /usr/local/bin/ssh <template> orted -mca ess env -mca orte_ess_jobid 2519662592 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2519662592.0;tcp://193.174.24.39:59753" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh:launch daemon 0 not a child of mine [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: adding node rs0 to launch list [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: adding node sunpc1 to launch list [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh:launch daemon 3 not a child of mine [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: activating launch event [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: recording launch of daemon [[38447,0],1] [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh rs0 orted -mca ess env -mca orte_ess_jobid 2519662592 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2519662592.0;tcp://193.174.24.39:59753" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh] [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh sunpc1 orted -mca ess env -mca orte_ess_jobid 2519662592 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2519662592.0;tcp://193.174.24.39:59753" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh] [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:rsh: recording launch of daemon [[38447,0],2] X11 forwarding request failed on channel 0 [sunpc1:22290] mca:base:select:(state) Querying component [app] [sunpc1:22290] mca:base:select:(state) Skipping component [app]. Query failed to return a module [sunpc1:22290] mca:base:select:(state) Querying component [hnp] [sunpc1:22290] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module [sunpc1:22290] mca:base:select:(state) Querying component [novm] [sunpc1:22290] mca:base:select:(state) Skipping component [novm]. Query failed to return a module [sunpc1:22290] mca:base:select:(state) Querying component [orted] [sunpc1:22290] mca:base:select:(state) Query of component [orted] set priority to 100 [sunpc1:22290] mca:base:select:(state) Querying component [staged_hnp] [sunpc1:22290] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module [sunpc1:22290] mca:base:select:(state) Querying component [staged_orted] [sunpc1:22290] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module [sunpc1:22290] mca:base:select:(state) Querying component [tool] [sunpc1:22290] mca:base:select:(state) Skipping component [tool]. Query failed to return a module [sunpc1:22290] mca:base:select:(state) Selected component [orted] [sunpc1:22290] mca:base:select:( plm) Querying component [rsh] [sunpc1:22290] [[38447,0],2] plm:rsh_lookup on agent ssh : rsh path NULL [sunpc1:22290] mca:base:select:( plm) Query of component [rsh] set priority to 10 [sunpc1:22290] mca:base:select:( plm) Selected component [rsh] [sunpc1:22290] [[38447,0],2] plm:rsh_setup on agent ssh : rsh path NULL [sunpc1:22290] [[38447,0],2] plm:base:receive start comm [sunpc1:22290] [[38447,0],2] ACTIVATE PROC [[38447,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205 [sunpc1:22290] [[38447,0],2] ACTIVATING PROC [[38447,0],0] STATE UNABLE TO SEND MSG PRI 0 [sunpc1:22290] [[38447,0],2] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [sunpc1:22290] [[38447,0],2] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [sunpc1:22290] [[38447,0],2] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0 [sunpc1:22290] [[38447,0],2] plm:base:receive stop comm [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] daemon 2 failed with status 1 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE PROC [[38447,0],2] STATE FAILED TO START AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:304 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING PROC [[38447,0],2] STATE FAILED TO START PRI 0 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:orted_cmd sending orted_exit commands [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATE JOB NULL STATE DAEMONS TERMINATED AT ../../openmpi-1.7.4rc2r30094/orte/orted/orted_comm.c:465 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] ACTIVATING JOB NULL STATE DAEMONS TERMINATED PRI 0 [tyr.informatik.hs-fulda.de:12297] [[38447,0],0] plm:base:receive stop comm tyr small_prog 56 [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [app] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [app]. Query failed to return a module [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [hnp] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [novm] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [novm]. Query failed to return a module [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [orted] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Query of component [orted] set priority to 100 [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [staged_hnp] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [staged_orted] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Querying component [tool] [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:03686] mca:base:select:(state) Selected component [orted] [rs0.informatik.hs-fulda.de:03686] mca:base:select:( plm) Querying component [rsh] [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:rsh_lookup on agent ssh : rsh path NULL [rs0.informatik.hs-fulda.de:03686] mca:base:select:( plm) Query of component [rsh] set priority to 10 [rs0.informatik.hs-fulda.de:03686] mca:base:select:( plm) Selected component [rsh] [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:rsh_setup on agent ssh : rsh path NULL [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:base:receive start comm [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATE PROC [[38447,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205 [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATING PROC [[38447,0],0] STATE UNABLE TO SEND MSG PRI 0 [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0 [rs0.informatik.hs-fulda.de:03686] [[38447,0],1] plm:base:receive stop comm tyr small_prog 56 echo $status 1 tyr small_prog 57 tyr small_prog 57 mpiexec -np 3 -host rs0,sunpc1,linpc1 -mca plm_base_verbose 5 \ -mca state_base_verbose 5 --hetero-nodes --hetero-apps rank_size [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [app] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [app]. Query failed to return a module [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [hnp] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Query of component [hnp] set priority to 60 [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [novm] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [novm]. Query failed to return a module [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [orted] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [orted]. Query failed to return a module [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [staged_hnp] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [staged_orted] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Querying component [tool] [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Skipping component [tool]. Query failed to return a module [tyr.informatik.hs-fulda.de:12313] mca:base:select:(state) Selected component [hnp] [tyr.informatik.hs-fulda.de:12313] mca:base:select:( plm) Querying component [rsh] [tyr.informatik.hs-fulda.de:12313] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL [tyr.informatik.hs-fulda.de:12313] mca:base:select:( plm) Query of component [rsh] set priority to 10 [tyr.informatik.hs-fulda.de:12313] mca:base:select:( plm) Selected component [rsh] [tyr.informatik.hs-fulda.de:12313] plm:base:set_hnp_name: initial bias 12313 nodename hash 339128848 [tyr.informatik.hs-fulda.de:12313] plm:base:set_hnp_name: final jobfam 38463 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh_setup on agent ssh : rsh path NULL [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:receive start comm [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [INVALID] STATE PENDING INIT AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:900 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [INVALID] STATE PENDING INIT PRI 4 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_job [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE INIT_COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:317 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE INIT_COMPLETE PRI 4 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE PENDING ALLOCATION AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:328 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE PENDING ALLOCATION PRI 4 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE ALLOCATION COMPLETE AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/ras/base/ras_base_allocate.c:423 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE ALLOCATION COMPLETE PRI 4 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB [38463,1] STATE PENDING DAEMON LAUNCH AT ../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/base/plm_base_launch_support.c:184 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB [38463,1] STATE PENDING DAEMON LAUNCH PRI 4 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm creating map [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] setup:vm: working unmanaged allocation [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] using dash_host [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] checking node rs0 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] checking node sunpc1 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] checking node linpc1 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm add new daemon [[38463,0],1] [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm assigning new daemon [[38463,0],1] to node rs0 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm add new daemon [[38463,0],2] [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm assigning new daemon [[38463,0],2] to node sunpc1 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm add new daemon [[38463,0],3] [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:setup_vm assigning new daemon [[38463,0],3] to node linpc1 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: launching vm [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: local shell: 2 (tcsh) [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: assuming same remote shell as local shell [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: remote shell: 2 (tcsh) [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: final template argv: /usr/local/bin/ssh <template> orted -mca orte_hetero_nodes 1 -mca ess env -mca orte_ess_jobid 2520711168 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2520711168.0;tcp://193.174.24.39:59756" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh -mca orte_hetero_apps 1 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh:launch daemon 0 not a child of mine [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: adding node rs0 to launch list [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: adding node sunpc1 to launch list [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh:launch daemon 3 not a child of mine [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: activating launch event [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: recording launch of daemon [[38463,0],1] [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh rs0 orted -mca orte_hetero_nodes 1 -mca ess env -mca orte_ess_jobid 2520711168 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2520711168.0;tcp://193.174.24.39:59756" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh -mca orte_hetero_apps 1] [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh sunpc1 orted -mca orte_hetero_nodes 1 -mca ess env -mca orte_ess_jobid 2520711168 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "2520711168.0;tcp://193.174.24.39:59756" --tree-spawn -mca plm_base_verbose 5 -mca state_base_verbose 5 -mca plm rsh -mca orte_hetero_apps 1] [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:rsh: recording launch of daemon [[38463,0],2] Warning: No xauth data; using fake authentication data for X11 forwarding. X11 forwarding request failed on channel 0 [sunpc1:22320] mca:base:select:(state) Querying component [app] [sunpc1:22320] mca:base:select:(state) Skipping component [app]. Query failed to return a module [sunpc1:22320] mca:base:select:(state) Querying component [hnp] [sunpc1:22320] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module [sunpc1:22320] mca:base:select:(state) Querying component [novm] [sunpc1:22320] mca:base:select:(state) Skipping component [novm]. Query failed to return a module [sunpc1:22320] mca:base:select:(state) Querying component [orted] [sunpc1:22320] mca:base:select:(state) Query of component [orted] set priority to 100 [sunpc1:22320] mca:base:select:(state) Querying component [staged_hnp] [sunpc1:22320] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module [sunpc1:22320] mca:base:select:(state) Querying component [staged_orted] [sunpc1:22320] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module [sunpc1:22320] mca:base:select:(state) Querying component [tool] [sunpc1:22320] mca:base:select:(state) Skipping component [tool]. Query failed to return a module [sunpc1:22320] mca:base:select:(state) Selected component [orted] [sunpc1:22320] mca:base:select:( plm) Querying component [rsh] [sunpc1:22320] [[38463,0],2] plm:rsh_lookup on agent ssh : rsh path NULL [sunpc1:22320] mca:base:select:( plm) Query of component [rsh] set priority to 10 [sunpc1:22320] mca:base:select:( plm) Selected component [rsh] [sunpc1:22320] [[38463,0],2] plm:rsh_setup on agent ssh : rsh path NULL [sunpc1:22320] [[38463,0],2] plm:base:receive start comm [sunpc1:22320] [[38463,0],2] ACTIVATE PROC [[38463,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205 [sunpc1:22320] [[38463,0],2] ACTIVATING PROC [[38463,0],0] STATE UNABLE TO SEND MSG PRI 0 [sunpc1:22320] [[38463,0],2] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [sunpc1:22320] [[38463,0],2] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [sunpc1:22320] [[38463,0],2] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0 [sunpc1:22320] [[38463,0],2] plm:base:receive stop comm [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] daemon 2 failed with status 1 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE PROC [[38463,0],2] STATE FAILED TO START AT ../../../../../openmpi-1.7.4rc2r30094/orte/mca/plm/rsh/plm_rsh_module.c:304 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING PROC [[38463,0],2] STATE FAILED TO START PRI 0 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:orted_cmd sending orted_exit commands [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATE JOB NULL STATE DAEMONS TERMINATED AT ../../openmpi-1.7.4rc2r30094/orte/orted/orted_comm.c:465 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] ACTIVATING JOB NULL STATE DAEMONS TERMINATED PRI 0 [tyr.informatik.hs-fulda.de:12313] [[38463,0],0] plm:base:receive stop comm tyr small_prog 58 [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [app] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [app]. Query failed to return a module [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [hnp] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [hnp]. Query failed to return a module [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [novm] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [novm]. Query failed to return a module [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [orted] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Query of component [orted] set priority to 100 [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [staged_hnp] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [staged_hnp]. Query failed to return a module [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [staged_orted] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [staged_orted]. Query failed to return a module [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Querying component [tool] [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:03718] mca:base:select:(state) Selected component [orted] [rs0.informatik.hs-fulda.de:03718] mca:base:select:( plm) Querying component [rsh] [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:rsh_lookup on agent ssh : rsh path NULL [rs0.informatik.hs-fulda.de:03718] mca:base:select:( plm) Query of component [rsh] set priority to 10 [rs0.informatik.hs-fulda.de:03718] mca:base:select:( plm) Selected component [rsh] [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:rsh_setup on agent ssh : rsh path NULL [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:base:receive start comm [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATE PROC [[38463,0],0] STATE UNABLE TO SEND MSG AT ../../../../openmpi-1.9a1r30100/orte/mca/rml/base/rml_base_frame.c:205 [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATING PROC [[38463,0],0] STATE UNABLE TO SEND MSG PRI 0 [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] FORCE-TERMINATE AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATE JOB NULL STATE FORCED EXIT AT ../../../../../openmpi-1.9a1r30100/orte/mca/errmgr/default_orted/errmgr_default_orted.c:259 [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] ACTIVATING JOB NULL STATE FORCED EXIT PRI 0 [rs0.informatik.hs-fulda.de:03718] [[38463,0],1] plm:base:receive stop comm tyr small_prog 58 echo $status 1 tyr small_prog 59 Kind regards Siegmar > On Jan 1, 2014, at 1:48 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > In the past I could run a small program in a real heterogeneous > > system with little (sunpc1, linpc1) and big endian (rs0, tyr) > > machines. > > > > tyr small_prog 101 ompi_info | grep MPI: > > Open MPI: 1.6.6a1r29175 > > tyr small_prog 102 mpiexec -np 3 -host rs0,sunpc1,linpc1 rank_size > > I'm process 1 of 3 available processes running on sunpc1. > > MPI standard 2.1 is supported. > > I'm process 0 of 3 available processes running on > > rs0.informatik.hs-fulda.de. > > MPI standard 2.1 is supported. > > I'm process 2 of 3 available processes running on linpc1. > > MPI standard 2.1 is supported. > > tyr small_prog 103 > > > > > > Now I get no output at all. > > > > tyr small_prog 130 ompi_info | grep MPI: > > Open MPI: 1.9a1r30100 > > tyr small_prog 131 mpiexec -np 3 -host rs0,sunpc1,linpc1 rank_size > > tyr small_prog 132 mpiexec -np 3 -host rs0,sunpc1,linpc1 \ > > --hetero-nodes --hetero-apps rank_size > > tyr small_prog 133 > > > > > > Perhaps this behaviour is intended, because Open MPI doesn't > > support little and big endian machines in the same cluster or > > virtual computer (I know only LAM-MPI which works in such an > > environment). On the other side: Does it make sense to run > > the command without any output, if it doesn't work (even if > > "mpiexec" returns "1")? >