Hi, today I tested rankfiles once more. The good news first: openmpi-1.7.4 now supports my Sun M4000 server with Sparc VII processors on the command line.
rs0 openmpi_1.7.x_or_newer 104 mpiexec --report-bindings -np 4 \ --bind-to hwthread hostname [rs0.informatik.hs-fulda.de:06051] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [../B./../..][../../../..] [rs0.informatik.hs-fulda.de:06051] MCW rank 2 bound to socket 1[core 4[hwt 0]]: [../../../..][B./../../..] [rs0.informatik.hs-fulda.de:06051] MCW rank 3 bound to socket 1[core 5[hwt 0]]: [../../../..][../B./../..] [rs0.informatik.hs-fulda.de:06051] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../..][../../../..] rs0.informatik.hs-fulda.de rs0.informatik.hs-fulda.de rs0.informatik.hs-fulda.de rs0.informatik.hs-fulda.de rs0 openmpi_1.7.x_or_newer 105 Thank you very much for solving this problem. Unfortunately I still have a problem with a rankfile. Contents of my rankfile: rank 0=rs0 slot=0:0-7 rank 1=rs0 slot=1 rank 2=rs1 slot=0 rank 3=rs1 slot=1 rs0 openmpi_1.7.x_or_newer 105 mpiexec --report-bindings \ --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname [rs0.informatik.hs-fulda.de:06060] [[7659,0],0] ORTE_ERROR_LOG: Not found in file .../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 283 [rs0.informatik.hs-fulda.de:06060] [[7659,0],0] ORTE_ERROR_LOG: Not found in file .../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line 284 rs0 openmpi_1.7.x_or_newer 106 rs0 openmpi_1.7.x_or_newer 110 mpiexec --report-bindings \ --display-allocation --mca rmaps_base_verbose_100 \ --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname ====================== ALLOCATED NODES ====================== rs0: slots=2 max_slots=0 slots_inuse=0 rs1: slots=2 max_slots=0 slots_inuse=0 ================================================================= [rs0.informatik.hs-fulda.de:06074] [[7677,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 283 [rs0.informatik.hs-fulda.de:06074] [[7677,0],0] ORTE_ERROR_LOG: Not found in file ../../../../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line 284 rs0 openmpi_1.7.x_or_newer 111 rs0 openmpi_1.7.x_or_newer 111 mpiexec --report-bindings --display-allocation --mca ess_base_verbose 5 --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying component [env] [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying component [hnp] [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Query of component [hnp] set priority to 100 [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying component [singleton] [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Querying component [tool] [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [rs0.informatik.hs-fulda.de:06078] mca:base:select:( ess) Selected component [hnp] [rs0.informatik.hs-fulda.de:06078] [[INVALID],INVALID] Topology Info: [rs0.informatik.hs-fulda.de:06078] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB Backend=Solaris OSName=SunOS OSRelease=5.10 OSVersion=Generic_150400-04 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 [rs1.informatik.hs-fulda.de:09657] mca:base:select:( ess) Querying component [env] [rs1.informatik.hs-fulda.de:09657] mca:base:select:( ess) Query of component [env] set priority to 20 [rs1.informatik.hs-fulda.de:09657] mca:base:select:( ess) Selected component [env] [rs1.informatik.hs-fulda.de:09657] ess:env set name to [[7673,0],1] [rs1.informatik.hs-fulda.de:09657] [[7673,0],1] Topology Info: [rs1.informatik.hs-fulda.de:09657] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB Backend=Solaris OSName=SunOS OSRelease=5.10 OSVersion=Generic_150400-04 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 ====================== ALLOCATED NODES ====================== rs0: slots=2 max_slots=0 slots_inuse=0 rs1: slots=2 max_slots=0 slots_inuse=0 ================================================================= [rs0.informatik.hs-fulda.de:06078] [[7673,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 283 [rs0.informatik.hs-fulda.de:06078] [[7673,0],0] ORTE_ERROR_LOG: Not found in file ../../../../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line 284 [rs1.informatik.hs-fulda.de:09657] [[7673,0],1] setting up session dir with tmpdir: UNDEF host rs1 rs0 openmpi_1.7.x_or_newer 112 rs0 openmpi_1.7.x_or_newer 113 mpiexec --report-bindings --display-allocation --mca plm_base_verbose 100 --use-hwthread-cpus -np 4 -rf rf_rs0_rs1 hostname [rs0.informatik.hs-fulda.de:06088] mca: base: components_register: registering plm components [rs0.informatik.hs-fulda.de:06088] mca: base: components_register: found loaded component rsh [rs0.informatik.hs-fulda.de:06088] mca: base: components_register: component rsh register function successful [rs0.informatik.hs-fulda.de:06088] mca: base: components_open: opening plm components [rs0.informatik.hs-fulda.de:06088] mca: base: components_open: found loaded component rsh [rs0.informatik.hs-fulda.de:06088] mca: base: components_open: component rsh open function successful [rs0.informatik.hs-fulda.de:06088] mca:base:select: Auto-selecting plm components [rs0.informatik.hs-fulda.de:06088] mca:base:select:( plm) Querying component [rsh] [rs0.informatik.hs-fulda.de:06088] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL [rs0.informatik.hs-fulda.de:06088] mca:base:select:( plm) Query of component [rsh] set priority to 10 [rs0.informatik.hs-fulda.de:06088] mca:base:select:( plm) Selected component [rsh] [rs0.informatik.hs-fulda.de:06088] plm:base:set_hnp_name: initial bias 6088 nodename hash 3909477186 [rs0.informatik.hs-fulda.de:06088] plm:base:set_hnp_name: final jobfam 7567 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh_setup on agent ssh : rsh path NULL [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:receive start comm [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_job [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm creating map [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] setup:vm: working unmanaged allocation [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] using rankfile rf_rs0_rs1 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] checking node rs0 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] ignoring myself [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] checking node rs1 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm add new daemon [[7567,0],1] [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:setup_vm assigning new daemon [[7567,0],1] to node rs1 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: launching vm [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: local shell: 2 (tcsh) [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: assuming same remote shell as local shell [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: remote shell: 2 (tcsh) [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: final template argv: /usr/local/bin/ssh <template> orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 495910912 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 2 -mca orte_hnp_uri "495910912.0;tcp://193.174.26.198,192.168.128.1,10.1.1.2:43810" --tree-spawn --mca plm_base_verbose 100 -mca plm rsh -mca orte_rankfile rf_rs0_rs1 -mca hwloc_base_use_hwthreads_as_cpus 1 -mca orte_display_alloc 1 -mca hwloc_base_report_bindings 1 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh:launch daemon 0 not a child of mine [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: adding node rs1 to launch list [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: activating launch event [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: recording launch of daemon [[7567,0],1] [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh rs1 orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 495910912 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 -mca orte_hnp_uri "495910912.0;tcp://193.174.26.198,192.168.128.1,10.1.1.2:43810" --tree-spawn --mca plm_base_verbose 100 -mca plm rsh -mca orte_rankfile rf_rs0_rs1 -mca hwloc_base_use_hwthreads_as_cpus 1 -mca orte_display_alloc 1 -mca hwloc_base_report_bindings 1] Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. [rs1.informatik.hs-fulda.de:09721] mca: base: components_register: registering plm components [rs1.informatik.hs-fulda.de:09721] mca: base: components_register: found loaded component rsh [rs1.informatik.hs-fulda.de:09721] mca: base: components_register: component rsh register function successful [rs1.informatik.hs-fulda.de:09721] mca: base: components_open: opening plm components [rs1.informatik.hs-fulda.de:09721] mca: base: components_open: found loaded component rsh [rs1.informatik.hs-fulda.de:09721] mca: base: components_open: component rsh open function successful [rs1.informatik.hs-fulda.de:09721] mca:base:select: Auto-selecting plm components [rs1.informatik.hs-fulda.de:09721] mca:base:select:( plm) Querying component [rsh] [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh_lookup on agent ssh : rsh path NULL [rs1.informatik.hs-fulda.de:09721] mca:base:select:( plm) Query of component [rsh] set priority to 10 [rs1.informatik.hs-fulda.de:09721] mca:base:select:( plm) Selected component [rsh] [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh_setup on agent ssh : rsh path NULL [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:base:receive start comm [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_report_launch from daemon [[7567,0],1] [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_report_launch from daemon [[7567,0],1] on node rs1 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] RECEIVED TOPOLOGY FROM NODE rs1 [rs0.informatik.hs-fulda.de:06088] Type: Machine Number of child objects: 1 Name=NULL total=33554432KB Backend=Solaris OSName=SunOS OSRelease=5.10 OSVersion=Generic_150400-04 Architecture=sun4u Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: TRUE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=33554432KB total=33554432KB Cpuset: 0x0000ffff Online: 0x0000ffff Allowed: 0x0000ffff Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x000000ff Online: 0x000000ff Allowed: 0x000000ff Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000030 Online: 0x00000030 Allowed: 0x00000030 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000010 Online: 0x00000010 Allowed: 0x00000010 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000020 Online: 0x00000020 Allowed: 0x00000020 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x000000c0 Online: 0x000000c0 Allowed: 0x000000c0 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000040 Online: 0x00000040 Allowed: 0x00000040 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000080 Online: 0x00000080 Allowed: 0x00000080 Type: Socket Number of child objects: 4 Name=NULL CPUType=sparcv9 CPUModel=SPARC64_VII Cpuset: 0x0000ff00 Online: 0x0000ff00 Allowed: 0x0000ff00 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000300 Online: 0x00000300 Allowed: 0x00000300 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000100 Online: 0x00000100 Allowed: 0x00000100 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000200 Online: 0x00000200 Allowed: 0x00000200 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00000c00 Online: 0x00000c00 Allowed: 0x00000c00 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000400 Online: 0x00000400 Allowed: 0x00000400 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000800 Online: 0x00000800 Allowed: 0x00000800 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x00003000 Online: 0x00003000 Allowed: 0x00003000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00001000 Online: 0x00001000 Allowed: 0x00001000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00002000 Online: 0x00002000 Allowed: 0x00002000 Type: Core Number of child objects: 2 Name=NULL Cpuset: 0x0000c000 Online: 0x0000c000 Allowed: 0x0000c000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00004000 Online: 0x00004000 Allowed: 0x00004000 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00008000 Online: 0x00008000 Allowed: 0x00008000 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] TOPOLOGY MATCHES - DISCARDING [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_report_launch completed for daemon [[7567,0],1] at contact 495910912.1;tcp://193.174.26.199,192.168.128.2,10.1.1.2:37231 ====================== ALLOCATED NODES ====================== rs0: slots=2 max_slots=0 slots_inuse=0 rs1: slots=2 max_slots=0 slots_inuse=0 ================================================================= [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh: remote spawn called [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:rsh: remote spawn - have no children! [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] ORTE_ERROR_LOG: Not found in file ../../../../../openmpi-1.7.4/orte/mca/rmaps/rank_file/rmaps_rank_file.c at line 283 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] ORTE_ERROR_LOG: Not found in file ../../../../openmpi-1.7.4/orte/mca/rmaps/base/rmaps_base_map_job.c at line 284 [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:orted_cmd sending orted_exit commands [rs0.informatik.hs-fulda.de:06088] [[7567,0],0] plm:base:receive stop comm [rs0.informatik.hs-fulda.de:06088] mca: base: close: component rsh closed [rs0.informatik.hs-fulda.de:06088] mca: base: close: unloading component rsh [rs1.informatik.hs-fulda.de:09721] [[7567,0],1] plm:base:receive stop comm [rs1.informatik.hs-fulda.de:09721] mca: base: close: component rsh closed [rs1.informatik.hs-fulda.de:09721] mca: base: close: unloading component rsh rs0 openmpi_1.7.x_or_newer 114 I still have the problem that I get no output if I mix little and big endian machines, which works for openmpi-1.6.x. linpc1 openmpi_1.7.x_or_newer 112 mpiexec -report-bindings -np 4 \ -rf rf_linpc_sunpc_tyr hostname linpc1 openmpi_1.7.x_or_newer 113 linpc1 openmpi_1.7.x_or_newer 188 mpiexec -report-bindings --display-allocation --mca plm_base_verbose 100 -np 1 -rf rf_linpc_sunpc_tyr hostname [linpc1:20650] mca: base: components_register: registering plm components [linpc1:20650] mca: base: components_register: found loaded component rsh [linpc1:20650] mca: base: components_register: component rsh register function successful [linpc1:20650] mca: base: components_register: found loaded component slurm [linpc1:20650] mca: base: components_register: component slurm register function successful [linpc1:20650] mca: base: components_open: opening plm components [linpc1:20650] mca: base: components_open: found loaded component rsh [linpc1:20650] mca: base: components_open: component rsh open function successful [linpc1:20650] mca: base: components_open: found loaded component slurm [linpc1:20650] mca: base: components_open: component slurm open function successful [linpc1:20650] mca:base:select: Auto-selecting plm components [linpc1:20650] mca:base:select:( plm) Querying component [rsh] [linpc1:20650] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL [linpc1:20650] mca:base:select:( plm) Query of component [rsh] set priority to 10 [linpc1:20650] mca:base:select:( plm) Querying component [slurm] [linpc1:20650] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [linpc1:20650] mca:base:select:( plm) Selected component [rsh] [linpc1:20650] mca: base: close: component slurm closed [linpc1:20650] mca: base: close: unloading component slurm [linpc1:20650] plm:base:set_hnp_name: initial bias 20650 nodename hash 3902177415 [linpc1:20650] plm:base:set_hnp_name: final jobfam 14523 [linpc1:20650] [[14523,0],0] plm:rsh_setup on agent ssh : rsh path NULL [linpc1:20650] [[14523,0],0] plm:base:receive start comm [linpc1:20650] [[14523,0],0] plm:base:setup_job [linpc1:20650] [[14523,0],0] plm:base:setup_vm [linpc1:20650] [[14523,0],0] plm:base:setup_vm creating map [linpc1:20650] [[14523,0],0] setup:vm: working unmanaged allocation [linpc1:20650] [[14523,0],0] using rankfile rf_linpc_sunpc_tyr [linpc1:20650] [[14523,0],0] checking node linpc0 [linpc1:20650] [[14523,0],0] checking node linpc1 [linpc1:20650] [[14523,0],0] ignoring myself [linpc1:20650] [[14523,0],0] checking node sunpc1 [linpc1:20650] [[14523,0],0] checking node tyr [linpc1:20650] [[14523,0],0] plm:base:setup_vm add new daemon [[14523,0],1] [linpc1:20650] [[14523,0],0] plm:base:setup_vm assigning new daemon [[14523,0],1] to node linpc0 [linpc1:20650] [[14523,0],0] plm:base:setup_vm add new daemon [[14523,0],2] [linpc1:20650] [[14523,0],0] plm:base:setup_vm assigning new daemon [[14523,0],2] to node sunpc1 [linpc1:20650] [[14523,0],0] plm:base:setup_vm add new daemon [[14523,0],3] [linpc1:20650] [[14523,0],0] plm:base:setup_vm assigning new daemon [[14523,0],3] to node tyr [linpc1:20650] [[14523,0],0] plm:rsh: launching vm [linpc1:20650] [[14523,0],0] plm:rsh: local shell: 2 (tcsh) [linpc1:20650] [[14523,0],0] plm:rsh: assuming same remote shell as local shell [linpc1:20650] [[14523,0],0] plm:rsh: remote shell: 2 (tcsh) [linpc1:20650] [[14523,0],0] plm:rsh: final template argv: /usr/local/bin/ssh <template> orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --tree-spawn --mca plm_base_verbose 100 -mca plm rsh -mca hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr [linpc1:20650] [[14523,0],0] plm:rsh:launch daemon 0 not a child of mine [linpc1:20650] [[14523,0],0] plm:rsh: adding node linpc0 to launch list [linpc1:20650] [[14523,0],0] plm:rsh: adding node sunpc1 to launch list [linpc1:20650] [[14523,0],0] plm:rsh:launch daemon 3 not a child of mine [linpc1:20650] [[14523,0],0] plm:rsh: activating launch event [linpc1:20650] [[14523,0],0] plm:rsh: recording launch of daemon [[14523,0],1] [linpc1:20650] [[14523,0],0] plm:rsh: recording launch of daemon [[14523,0],2] [linpc1:20650] [[14523,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh sunpc1 orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --tree-spawn --mca plm_base_verbose 100 -mca plm rsh -mca hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr] [linpc1:20650] [[14523,0],0] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh linpc0 orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 4 -mca orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --tree-spawn --mca plm_base_verbose 100 -mca plm rsh -mca hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr] Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. X11 forwarding request failed on channel 0 Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. [sunpc1:09408] mca: base: components_register: registering plm components [sunpc1:09408] mca: base: components_register: found loaded component rsh [sunpc1:09408] mca: base: components_register: component rsh register function successful [sunpc1:09408] mca: base: components_open: opening plm components [sunpc1:09408] mca: base: components_open: found loaded component rsh [sunpc1:09408] mca: base: components_open: component rsh open function successful [sunpc1:09408] mca:base:select: Auto-selecting plm components [sunpc1:09408] mca:base:select:( plm) Querying component [rsh] [sunpc1:09408] [[14523,0],2] plm:rsh_lookup on agent ssh : rsh path NULL [sunpc1:09408] mca:base:select:( plm) Query of component [rsh] set priority to 10 [sunpc1:09408] mca:base:select:( plm) Selected component [rsh] [sunpc1:09408] [[14523,0],2] plm:rsh_setup on agent ssh : rsh path NULL [sunpc1:09408] [[14523,0],2] plm:base:receive start comm [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon [[14523,0],2] [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon [[14523,0],2] on node sunpc1 [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch completed for daemon [[14523,0],2] at contact 951779328.2;tcp://193.174.26.210:33215 [sunpc1:09408] [[14523,0],2] plm:rsh: remote spawn called [sunpc1:09408] [[14523,0],2] plm:rsh: remote spawn - have no children! [linpc0:32306] mca: base: components_register: registering plm components [linpc0:32306] mca: base: components_register: found loaded component rsh [linpc0:32306] mca: base: components_register: component rsh register function successful [linpc0:32306] mca: base: components_open: opening plm components [linpc0:32306] mca: base: components_open: found loaded component rsh [linpc0:32306] mca: base: components_open: component rsh open function successful [linpc0:32306] mca:base:select: Auto-selecting plm components [linpc0:32306] mca:base:select:( plm) Querying component [rsh] [linpc0:32306] [[14523,0],1] plm:rsh_lookup on agent ssh : rsh path NULL [linpc0:32306] mca:base:select:( plm) Query of component [rsh] set priority to 10 [linpc0:32306] mca:base:select:( plm) Selected component [rsh] [linpc0:32306] [[14523,0],1] plm:rsh_setup on agent ssh : rsh path NULL [linpc0:32306] [[14523,0],1] plm:base:receive start comm [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon [[14523,0],1] [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch from daemon [[14523,0],1] on node linpc0 [linpc1:20650] [[14523,0],0] RECEIVED TOPOLOGY FROM NODE linpc0 [linpc1:20650] Type: Machine Number of child objects: 2 Name=NULL total=8387048KB DMIProductName="Sun Ultra 40 Workstation" DMIProductVersion=11 DMIBoardVendor="Sun Microsystems" DMIBoardName="Sun Ultra 40 Workstation" DMIBoardVersion=50 DMIBoardAssetTag= DMIChassisVendor="Sun Microsystems" DMIChassisType=17 DMIChassisVersion=01 DMIChassisAssetTag= DMIBIOSVendor="Phoenix Technologies Ltd." DMIBIOSVersion="1.70 " DMIBIOSDate=02/15/2008 DMISysVendor="Sun Microsystems" Backend=Linux OSName=Linux OSRelease=3.1.10-1.16-desktop OSVersion="#1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078)" Architecture=x86_64 Cpuset: 0x0000000f Online: 0x0000000f Allowed: 0x0000000f Bind CPU proc: TRUE Bind CPU thread: TRUE Bind MEM proc: FALSE Bind MEM thread: TRUE Type: NUMANode Number of child objects: 2 Name=NULL local=4192744KB total=4192744KB Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: Socket Number of child objects: 2 Name=NULL CPUModel="Dual Core AMD Opteron(tm) Processor 280" Cpuset: 0x00000003 Online: 0x00000003 Allowed: 0x00000003 Type: L2Cache Number of child objects: 1 Name=NULL size=1024KB linesize=64 ways=16 Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: L1dCache Number of child objects: 1 Name=NULL size=64KB linesize=64 ways=2 Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: Core Number of child objects: 1 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000001 Online: 0x00000001 Allowed: 0x00000001 Type: L2Cache Number of child objects: 1 Name=NULL size=1024KB linesize=64 ways=16 Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: L1dCache Number of child objects: 1 Name=NULL size=64KB linesize=64 ways=2 Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Core Number of child objects: 1 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000002 Online: 0x00000002 Allowed: 0x00000002 Type: Bridge Host->PCI Number of child objects: 4 Name=NULL buses=0000:[00-03] Type: PCI 10de:0053 Number of child objects: 1 Name=nVidia Corporation CK804 IDE busid=0000:00:06.0 class=0101(IDE) PCIVendor="nVidia Corporation" PCIDevice="CK804 IDE" Type: Block Number of child objects: 0 Name=sr0 Type: PCI 10de:0055 Number of child objects: 1 Name=nVidia Corporation CK804 Serial ATA Controller busid=0000:00:07.0 class=0101(IDE) PCIVendor="nVidia Corporation" PCIDevice="CK804 Serial ATA Controller" Type: Block Number of child objects: 0 Name=sda Type: PCI 10de:0054 Number of child objects: 0 Name=nVidia Corporation CK804 Serial ATA Controller busid=0000:00:08.0 class=0101(IDE) PCIVendor="nVidia Corporation" PCIDevice="CK804 Serial ATA Controller" Type: PCI 10de:029d Number of child objects: 2 Name=nVidia Corporation G71GL [Quadro FX 3500] busid=0000:03:00.0 class=0300(VGA) PCIVendor="nVidia Corporation" PCIDevice="G71GL [Quadro FX 3500]" Type: GPU Number of child objects: 0 Name=controlD64 Type: GPU Number of child objects: 0 Name=card0 Type: NUMANode Number of child objects: 2 Name=NULL local=4194304KB total=4194304KB Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: Socket Number of child objects: 2 Name=NULL CPUModel="Dual Core AMD Opteron(tm) Processor 280" Cpuset: 0x0000000c Online: 0x0000000c Allowed: 0x0000000c Type: L2Cache Number of child objects: 1 Name=NULL size=1024KB linesize=64 ways=16 Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: L1dCache Number of child objects: 1 Name=NULL size=64KB linesize=64 ways=2 Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: Core Number of child objects: 1 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000004 Online: 0x00000004 Allowed: 0x00000004 Type: L2Cache Number of child objects: 1 Name=NULL size=1024KB linesize=64 ways=16 Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: L1dCache Number of child objects: 1 Name=NULL size=64KB linesize=64 ways=2 Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Core Number of child objects: 1 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: PU Number of child objects: 0 Name=NULL Cpuset: 0x00000008 Online: 0x00000008 Allowed: 0x00000008 Type: Bridge Host->PCI Number of child objects: 2 Name=NULL buses=0000:[80-82] Type: PCI 10de:0054 Number of child objects: 0 Name=nVidia Corporation CK804 Serial ATA Controller busid=0000:80:07.0 class=0101(IDE) PCIVendor="nVidia Corporation" PCIDevice="CK804 Serial ATA Controller" Type: PCI 10de:0055 Number of child objects: 0 Name=nVidia Corporation CK804 Serial ATA Controller busid=0000:80:08.0 class=0101(IDE) PCIVendor="nVidia Corporation" PCIDevice="CK804 Serial ATA Controller" [linpc1:20650] [[14523,0],0] NEW TOPOLOGY - ADDING [linpc1:20650] [[14523,0],0] plm:base:orted_report_launch completed for daemon [[14523,0],1] at contact 951779328.1;tcp://193.174.26.214,192.168.1.1:57891 [linpc0:32306] [[14523,0],1] plm:rsh: remote spawn called [linpc0:32306] [[14523,0],1] plm:rsh: local shell: 2 (tcsh) [linpc0:32306] [[14523,0],1] plm:rsh: assuming same remote shell as local shell [linpc0:32306] [[14523,0],1] plm:rsh: remote shell: 2 (tcsh) [linpc0:32306] [[14523,0],1] plm:rsh: final template argv: /usr/local/bin/ssh <template> orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca orte_ess_vpid <template> -mca orte_ess_num_procs 4 -mca orte_parent_uri "951779328.1;tcp://193.174.26.214,192.168.1.1:57891" -mca orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --mca plm_base_verbose 100 -mca hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr -mca plm rsh [linpc0:32306] [[14523,0],1] plm:rsh: activating launch event [linpc0:32306] [[14523,0],1] plm:rsh: recording launch of daemon [[14523,0],3] [linpc0:32306] [[14523,0],1] plm:rsh: executing: (/usr/local/bin/ssh) [/usr/local/bin/ssh tyr orted -mca orte_report_bindings 1 -mca ess env -mca orte_ess_jobid 951779328 -mca orte_ess_vpid 3 -mca orte_ess_num_procs 4 -mca orte_parent_uri "951779328.1;tcp://193.174.26.214,192.168.1.1:57891" -mca orte_hnp_uri "951779328.0;tcp://193.174.26.208:46876" --mca plm_base_verbose 100 -mca hwloc_base_report_bindings 1 -mca orte_display_alloc 1 -mca orte_rankfile rf_linpc_sunpc_tyr -mca plm rsh --tree-spawn] Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. [tyr.informatik.hs-fulda.de:23227] mca: base: components_register: registering plm components [tyr.informatik.hs-fulda.de:23227] mca: base: components_register: found loaded component rsh [tyr.informatik.hs-fulda.de:23227] mca: base: components_register: component rsh register function successful [tyr.informatik.hs-fulda.de:23227] mca: base: components_open: opening plm components [tyr.informatik.hs-fulda.de:23227] mca: base: components_open: found loaded component rsh [tyr.informatik.hs-fulda.de:23227] mca: base: components_open: component rsh open function successful [tyr.informatik.hs-fulda.de:23227] mca:base:select: Auto-selecting plm components [tyr.informatik.hs-fulda.de:23227] mca:base:select:( plm) Querying component [rsh] [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:rsh_lookup on agent ssh : rsh path NULL [tyr.informatik.hs-fulda.de:23227] mca:base:select:( plm) Query of component [rsh] set priority to 10 [tyr.informatik.hs-fulda.de:23227] mca:base:select:( plm) Selected component [rsh] [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:rsh_setup on agent ssh : rsh path NULL [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:base:receive start comm [tyr.informatik.hs-fulda.de:23227] [[14523,0],3] plm:base:receive stop comm [tyr.informatik.hs-fulda.de:23227] mca: base: close: component rsh closed [tyr.informatik.hs-fulda.de:23227] mca: base: close: unloading component rsh [linpc0:32306] [[14523,0],1] daemon 3 failed with status 1 [linpc1:20650] [[14523,0],0] plm:base:orted_cmd sending orted_exit commands [linpc1:20650] [[14523,0],0] plm:base:receive stop comm [linpc1:20650] mca: base: close: component rsh closed [linpc1:20650] mca: base: close: unloading component rsh linpc1 openmpi_1.7.x_or_newer 189 [sunpc1:09408] [[14523,0],2] plm:base:receive stop comm [sunpc1:09408] mca: base: close: component rsh closed [sunpc1:09408] mca: base: close: unloading component rsh [linpc0:32306] [[14523,0],1] plm:base:receive stop comm [linpc0:32306] mca: base: close: component rsh closed [linpc0:32306] mca: base: close: unloading component rsh linpc1 openmpi_1.7.x_or_newer 189 linpc1 openmpi_1.7.x_or_newer 189 mpiexec -report-bindings --display-allocation --mca rmaps_base_verbose_100 -np 1 -rf rf_linpc_sunpc_tyr hostname ====================== ALLOCATED NODES ====================== linpc1: slots=1 max_slots=0 slots_inuse=0 ================================================================= -------------------------------------------------------------------------- mpiexec was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 0; it may have occurred for other processes as well. NOTE: A common cause for this error is misspelling a mpiexec command line parameter option (remember that mpiexec interprets the first unrecognized command line token as the executable). Node: linpc1 Executable: 1 -------------------------------------------------------------------------- linpc1 openmpi_1.7.x_or_newer 190 Kind regards Siegmar