Hello Ralph, > Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 > and see if that works
I get even more warnings with the new option. It seems that I always get the bindings only for the local machine. I used Solaris Sparc (tyr), Solaris x86_64 (sunpc1), and Linux x86_64 (linpc1) as local machines for the following outputs. tyr openmpi_1.7.x_or_newer 104 mpiexec --mca hwloc_base_report_bindings 1 -np 4 -rf rf_linpc_sunpc_tyr hostname -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- [tyr.informatik.hs-fulda.de:00555] MCW rank 3 bound to socket 1[core 1[hwt 0]]: [.][B] tyr.informatik.hs-fulda.de linpc1 linpc0 sunpc1 tyr openmpi_1.7.x_or_newer 105 I get the following sligthly different output, if I run the command on Linux. linpc1 openmpi_1.7.x_or_newer 102 mpiexec --mca hwloc_base_report_bindings 1 -np 4 -rf rf_linpc_sunpc_tyr hostname -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- [linpc1:24181] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.] linpc1 linpc0 -------------------------------------------------------------------------- Open MPI tried to bind a new process, but something went wrong. The process was killed without launching the target application. Your job will now abort. Local host: tyr Application name: /usr/local/bin/hostname Error message: hwloc_set_cpubind returned "Error" for bitmap "2" Location: ../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551 -------------------------------------------------------------------------- sunpc1 linpc1 openmpi_1.7.x_or_newer 103 I get a similar output, if I run the command on Solaris x86_64. sunpc1 openmpi_1.7.x_or_newer 105 mpiexec --mca hwloc_base_report_bindings 1 -np 4 -rf rf_linpc_sunpc_tyr hostname -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- -------------------------------------------------------------------------- An invalid value was supplied for an enum variable. Variable : hwloc_base_report_bindings Value : 1,1 Valid values : 0: f|false|disabled, 1: t|true|enabled -------------------------------------------------------------------------- [sunpc1:04874] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] sunpc1 linpc0 linpc1 -------------------------------------------------------------------------- Open MPI tried to bind a new process, but something went wrong. The process was killed without launching the target application. Your job will now abort. Local host: tyr Application name: /usr/local/bin/hostname Error message: hwloc_set_cpubind returned "Error" for bitmap "2" Location: ../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551 -------------------------------------------------------------------------- sunpc1 openmpi_1.7.x_or_newer 106 Kind regards Siegmar > On Aug 7, 2014, at 4:04 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi, > > > >> I can't replicate - this worked fine for me. I'm at a loss as > >> to how you got that error as it would require some strange > >> error in the report-bindngs option. If you remove that option > >> from your cmd line, does the problem go away? > > > > Yes. > > > > tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf rf_linpc_sunpc_tyr hostname > > tyr.informatik.hs-fulda.de > > linpc0 > > linpc1 > > sunpc1 > > > > > > tyr openmpi_1.7.x_or_newer 469 mpiexec -report-bindings -np 4 -rf > > rf_linpc_sunpc_tyr hostname > > -------------------------------------------------------------------------- > > An invalid value was supplied for an enum variable. > > > > Variable : hwloc_base_report_bindings > > Value : 1,1 > > Valid values : 0: f|false|disabled, 1: t|true|enabled > > -------------------------------------------------------------------------- > > tyr.informatik.hs-fulda.de > > [tyr.informatik.hs-fulda.de:29900] MCW rank 3 bound to socket 1[core 1[hwt > > 0]]: > > [.][B] > > [linpc0:04217] MCW rank 0 is not bound (or bound to all available > > processors) > > [linpc1:23107] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > > 1[hwt > > 0]]: [B/B][./.] > > linpc0 > > linpc1 > > sunpc1 > > tyr openmpi_1.7.x_or_newer 470 > > > > > > > > Kind regards > > > > Siegmar > > > > > > > > > >> On Aug 5, 2014, at 12:56 AM, Siegmar Gross > > <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> Hi, > >>> > >>> yesterday I installed openmpi-1.8.2rc3 on my machines > >>> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE > >>> Linux 12.1 x86_64) with Sun C 5.12. I get an error, > >>> if I use a rankfile for all three architectures. > >>> The error message depends on the local machine, which > >>> I use to run "mpiexec". I get a different error, if I > >>> use two "Sparc64 VII" machines (see below). > >>> > >>> tyr openmpi_1.7.x_or_newer 109 cat rf_linpc_sunpc_tyr > >>> rank 0=linpc0 slot=0:0-1;1:0-1 > >>> rank 1=linpc1 slot=0:0-1 > >>> rank 2=sunpc1 slot=1:0 > >>> rank 3=tyr slot=1:0 > >>> tyr openmpi_1.7.x_or_newer 110 > >>> > >>> > >>> I get the following message, if I run "mpiexec" on > >>> Solaris 10 Sparc. > >>> > >>> tyr openmpi_1.7.x_or_newer 110 mpiexec -report-bindings -np 4 -rf > >>> rf_linpc_sunpc_tyr hostname > >>> -------------------------------------------------------------------------- > >>> An invalid value was supplied for an enum variable. > >>> > >>> Variable : hwloc_base_report_bindings > >>> Value : 1,1 > >>> Valid values : 0: f|false|disabled, 1: t|true|enabled > >>> -------------------------------------------------------------------------- > >>> [tyr.informatik.hs-fulda.de:26960] MCW rank 3 bound to socket 1[core > >>> 1[hwt > > 0]]: > >>> [.][B] > >>> tyr.informatik.hs-fulda.de > >>> [linpc1:12109] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > > 1[hwt > >>> 0]]: [B/B][./.] > >>> [linpc0:26642] MCW rank 0 is not bound (or bound to all available > > processors) > >>> linpc1 > >>> linpc0 > >>> sunpc1 > >>> tyr openmpi_1.7.x_or_newer 111 > >>> > >>> > >>> > >>> I get the following message, if I run "mpiexec" on > >>> Solaris 10 x86_64 or Linux x86_64. > >>> > >>> sunpc1 openmpi_1.7.x_or_newer 109 mpiexec -report-bindings -np 4 -rf > >>> rf_linpc_sunpc_tyr hostname > >>> -------------------------------------------------------------------------- > >>> An invalid value was supplied for an enum variable. > >>> > >>> Variable : hwloc_base_report_bindings > >>> Value : 1,1 > >>> Valid values : 0: f|false|disabled, 1: t|true|enabled > >>> -------------------------------------------------------------------------- > >>> [sunpc1:02931] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] > >>> sunpc1 > >>> [linpc0:26850] MCW rank 0 is not bound (or bound to all available > > processors) > >>> [linpc1:12386] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > > 1[hwt > >>> 0]]: [B/B][./.] > >>> linpc0 > >>> linpc1 > >>> -------------------------------------------------------------------------- > >>> Open MPI tried to bind a new process, but something went wrong. The > >>> process was killed without launching the target application. Your job > >>> will now abort. > >>> > >>> Local host: tyr > >>> Application name: /usr/local/bin/hostname > >>> Error message: hwloc_set_cpubind returned "Error" for bitmap "2" > >>> Location: > >>> > > ../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551 > >>> -------------------------------------------------------------------------- > >>> sunpc1 openmpi_1.7.x_or_newer 110 > >>> > >>> > >>> > >>> > >>> The rankfile worked for older versions of Open MPI. > >>> > >>> tyr openmpi_1.7.x_or_newer 139 ompi_info | grep MPI: > >>> Open MPI: 1.8.2a1r31804 > >>> tyr openmpi_1.7.x_or_newer 140 mpiexec -report-bindings -np 4 -rf > >>> rf_linpc_sunpc_tyr hostname > >>> [tyr.informatik.hs-fulda.de:27171] MCW rank 3 bound to socket 1[core > >>> 1[hwt > > 0]]: > >>> [.][B] > >>> tyr.informatik.hs-fulda.de > >>> [linpc1:12790] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core > > 1[hwt > >>> 0]]: [B/B][./.] > >>> [linpc0:27221] MCW rank 0 is not bound (or bound to all available > > processors) > >>> linpc1 > >>> linpc0 > >>> [sunpc1:03046] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] > >>> sunpc1 > >>> tyr openmpi_1.7.x_or_newer 141 > >>> > >>> > >>> > >>> > >>> I get the following error, if I use two Sparc machines > >>> (Sun M4000 servers with two quad core Sparc64 VII processors > >>> and two hardware threads per core). I'm not sure if this > >>> worked before or if I have to use different options to make > >>> it working. > >>> > >>> tyr openmpi_1.7.x_or_newer 151 cat rf_rs0_rs1 > >>> rank 0=rs0 slot=0:0-7 > >>> rank 1=rs0 slot=1 > >>> rank 2=rs1 slot=0 > >>> rank 3=rs1 slot=1 > >>> tyr openmpi_1.7.x_or_newer 152 > >>> > >>> rs0 openmpi_1.7.x_or_newer 104 mpiexec --report-bindings > >>> --use-hwthread-cpus > > -np > >>> 4 -rf rf_rs0_rs1 hostname > >>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not > >>> found > > in > >>> file > > ../../../../../openmpi-1.8.2rc3/orte/mca/rmaps/rank_file/rmaps_rank_file.c > >>> at line 279 > >>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not > >>> found > > in > >>> file > >>> ../../../../openmpi-1.8.2rc3/orte/mca/rmaps/base/rmaps_base_map_job.c > > at > >>> line 285 > >>> rs0 openmpi_1.7.x_or_newer 105 > >>> > >>> > >>> It works for the following command. > >>> > >>> rs0 openmpi_1.7.x_or_newer 107 mpiexec --report-bindings -np 4 --host > > rs0,rs1 > >>> --bind-to hwthread hostname > >>> [rs0.informatik.hs-fulda.de:26102] MCW rank 0 bound to socket 0[core > >>> 0[hwt > > 0]]: > >>> [B./../../..][../../../..] > >>> [rs0.informatik.hs-fulda.de:26102] MCW rank 1 bound to socket 1[core > >>> 4[hwt > > 0]]: > >>> [../../../..][B./../../..] > >>> rs0.informatik.hs-fulda.de > >>> rs0.informatik.hs-fulda.de > >>> rs1.informatik.hs-fulda.de > >>> [rs1.informatik.hs-fulda.de:28740] MCW rank 2 bound to socket 0[core > >>> 0[hwt > > 0]]: > >>> [B./../../..][../../../..] > >>> [rs1.informatik.hs-fulda.de:28740] MCW rank 3 bound to socket 1[core > >>> 4[hwt > > 0]]: > >>> [../../../..][B./../../..] > >>> rs1.informatik.hs-fulda.de > >>> rs0 openmpi_1.7.x_or_newer 108 > >>> > >>> > >>> I would be grateful if somebody could fix the problem. Please let > >>> me know if I can provide anything else. Thank you very much for > >>> any help in advance. > >>> > >>> > >>> Kind regards > >>> > >>> Siegmar > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/24907.php > >> > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/08/24936.php >