Hello Ralph,

> Try replacing --report-bindings with -mca hwloc_base_report_bindings 1
> and see if that works

I get even more warnings with the new option. It seems that I
always get the bindings only for the local machine. I used
Solaris Sparc (tyr), Solaris x86_64 (sunpc1), and Linux x86_64
(linpc1) as local machines for the following outputs.

tyr openmpi_1.7.x_or_newer 104 mpiexec --mca hwloc_base_report_bindings 1 -np 4 
-rf rf_linpc_sunpc_tyr 
hostname
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
[tyr.informatik.hs-fulda.de:00555] MCW rank 3 bound to socket 1[core 1[hwt 0]]: 
[.][B]
tyr.informatik.hs-fulda.de
linpc1
linpc0
sunpc1
tyr openmpi_1.7.x_or_newer 105 



I get the following sligthly different output, if I run the
command on Linux.

linpc1 openmpi_1.7.x_or_newer 102 mpiexec --mca hwloc_base_report_bindings 1 
-np 4 -rf rf_linpc_sunpc_tyr 
hostname
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
[linpc1:24181] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 
0]]: [B/B][./.]
linpc1
linpc0
--------------------------------------------------------------------------
Open MPI tried to bind a new process, but something went wrong.  The
process was killed without launching the target application.  Your job
will now abort.

  Local host:        tyr
  Application name:  /usr/local/bin/hostname
  Error message:     hwloc_set_cpubind returned "Error" for bitmap "2"
  Location:          
../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551
--------------------------------------------------------------------------
sunpc1
linpc1 openmpi_1.7.x_or_newer 103 



I get a similar output, if I run the command on Solaris x86_64.

sunpc1 openmpi_1.7.x_or_newer 105 mpiexec --mca hwloc_base_report_bindings 1 
-np 4 -rf rf_linpc_sunpc_tyr 
hostname
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An invalid value was supplied for an enum variable.

  Variable     : hwloc_base_report_bindings
  Value        : 1,1
  Valid values : 0: f|false|disabled, 1: t|true|enabled
--------------------------------------------------------------------------
[sunpc1:04874] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
sunpc1
linpc0
linpc1
--------------------------------------------------------------------------
Open MPI tried to bind a new process, but something went wrong.  The
process was killed without launching the target application.  Your job
will now abort.

  Local host:        tyr
  Application name:  /usr/local/bin/hostname
  Error message:     hwloc_set_cpubind returned "Error" for bitmap "2"
  Location:          
../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551
--------------------------------------------------------------------------
sunpc1 openmpi_1.7.x_or_newer 106 


Kind regards

Siegmar




> On Aug 7, 2014, at 4:04 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> > Hi,
> > 
> >> I can't replicate - this worked fine for me. I'm at a loss as
> >> to how you got that error as it would require some strange
> >> error in the report-bindngs option. If you remove that option
> >> from your cmd line, does the problem go away?
> > 
> > Yes.
> > 
> > tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf rf_linpc_sunpc_tyr hostname
> > tyr.informatik.hs-fulda.de
> > linpc0
> > linpc1
> > sunpc1
> > 
> > 
> > tyr openmpi_1.7.x_or_newer 469 mpiexec -report-bindings -np 4 -rf 
> > rf_linpc_sunpc_tyr hostname
> > --------------------------------------------------------------------------
> > An invalid value was supplied for an enum variable.
> > 
> >  Variable     : hwloc_base_report_bindings
> >  Value        : 1,1
> >  Valid values : 0: f|false|disabled, 1: t|true|enabled
> > --------------------------------------------------------------------------
> > tyr.informatik.hs-fulda.de
> > [tyr.informatik.hs-fulda.de:29900] MCW rank 3 bound to socket 1[core 1[hwt 
> > 0]]: 
> > [.][B]
> > [linpc0:04217] MCW rank 0 is not bound (or bound to all available 
> > processors)
> > [linpc1:23107] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> > 1[hwt 
> > 0]]: [B/B][./.]
> > linpc0
> > linpc1
> > sunpc1
> > tyr openmpi_1.7.x_or_newer 470 
> > 
> > 
> > 
> > Kind regards
> > 
> > Siegmar
> > 
> > 
> > 
> > 
> >> On Aug 5, 2014, at 12:56 AM, Siegmar Gross 
> > <siegmar.gr...@informatik.hs-fulda.de> wrote:
> >> 
> >>> Hi,
> >>> 
> >>> yesterday I installed openmpi-1.8.2rc3 on my machines
> >>> (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE
> >>> Linux 12.1 x86_64) with Sun C 5.12. I get an error,
> >>> if I use a rankfile for all three architectures.
> >>> The error message depends on the local machine, which
> >>> I use to run "mpiexec". I get a different error, if I
> >>> use two "Sparc64 VII" machines (see below).
> >>> 
> >>> tyr openmpi_1.7.x_or_newer 109 cat rf_linpc_sunpc_tyr
> >>> rank 0=linpc0 slot=0:0-1;1:0-1
> >>> rank 1=linpc1 slot=0:0-1
> >>> rank 2=sunpc1 slot=1:0
> >>> rank 3=tyr slot=1:0
> >>> tyr openmpi_1.7.x_or_newer 110 
> >>> 
> >>> 
> >>> I get the following message, if I run "mpiexec" on
> >>> Solaris 10 Sparc.
> >>> 
> >>> tyr openmpi_1.7.x_or_newer 110 mpiexec -report-bindings -np 4 -rf 
> >>> rf_linpc_sunpc_tyr hostname
> >>> --------------------------------------------------------------------------
> >>> An invalid value was supplied for an enum variable.
> >>> 
> >>> Variable     : hwloc_base_report_bindings
> >>> Value        : 1,1
> >>> Valid values : 0: f|false|disabled, 1: t|true|enabled
> >>> --------------------------------------------------------------------------
> >>> [tyr.informatik.hs-fulda.de:26960] MCW rank 3 bound to socket 1[core 
> >>> 1[hwt 
> > 0]]: 
> >>> [.][B]
> >>> tyr.informatik.hs-fulda.de
> >>> [linpc1:12109] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> > 1[hwt 
> >>> 0]]: [B/B][./.]
> >>> [linpc0:26642] MCW rank 0 is not bound (or bound to all available 
> > processors)
> >>> linpc1
> >>> linpc0
> >>> sunpc1
> >>> tyr openmpi_1.7.x_or_newer 111 
> >>> 
> >>> 
> >>> 
> >>> I get the following message, if I run "mpiexec" on
> >>> Solaris 10 x86_64 or Linux x86_64.
> >>> 
> >>> sunpc1 openmpi_1.7.x_or_newer 109 mpiexec -report-bindings -np 4 -rf 
> >>> rf_linpc_sunpc_tyr hostname
> >>> --------------------------------------------------------------------------
> >>> An invalid value was supplied for an enum variable.
> >>> 
> >>> Variable     : hwloc_base_report_bindings
> >>> Value        : 1,1
> >>> Valid values : 0: f|false|disabled, 1: t|true|enabled
> >>> --------------------------------------------------------------------------
> >>> [sunpc1:02931] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> >>> sunpc1
> >>> [linpc0:26850] MCW rank 0 is not bound (or bound to all available 
> > processors)
> >>> [linpc1:12386] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> > 1[hwt 
> >>> 0]]: [B/B][./.]
> >>> linpc0
> >>> linpc1
> >>> --------------------------------------------------------------------------
> >>> Open MPI tried to bind a new process, but something went wrong.  The
> >>> process was killed without launching the target application.  Your job
> >>> will now abort.
> >>> 
> >>> Local host:        tyr
> >>> Application name:  /usr/local/bin/hostname
> >>> Error message:     hwloc_set_cpubind returned "Error" for bitmap "2"
> >>> Location:          
> >>> 
> > ../../../../../openmpi-1.8.2rc3/orte/mca/odls/default/odls_default_module.c:551
> >>> --------------------------------------------------------------------------
> >>> sunpc1 openmpi_1.7.x_or_newer 110 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> The rankfile worked for older versions of Open MPI.
> >>> 
> >>> tyr openmpi_1.7.x_or_newer 139 ompi_info | grep MPI:
> >>>               Open MPI: 1.8.2a1r31804
> >>> tyr openmpi_1.7.x_or_newer 140 mpiexec -report-bindings -np 4 -rf 
> >>> rf_linpc_sunpc_tyr hostname
> >>> [tyr.informatik.hs-fulda.de:27171] MCW rank 3 bound to socket 1[core 
> >>> 1[hwt 
> > 0]]: 
> >>> [.][B]
> >>> tyr.informatik.hs-fulda.de
> >>> [linpc1:12790] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket 0[core 
> > 1[hwt 
> >>> 0]]: [B/B][./.]
> >>> [linpc0:27221] MCW rank 0 is not bound (or bound to all available 
> > processors)
> >>> linpc1
> >>> linpc0
> >>> [sunpc1:03046] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> >>> sunpc1
> >>> tyr openmpi_1.7.x_or_newer 141 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> I get the following error, if I use two Sparc machines
> >>> (Sun M4000 servers with two quad core Sparc64 VII processors
> >>> and two hardware threads per core). I'm not sure if this
> >>> worked before or if I have to use different options to make
> >>> it working.
> >>> 
> >>> tyr openmpi_1.7.x_or_newer 151 cat rf_rs0_rs1
> >>> rank 0=rs0 slot=0:0-7
> >>> rank 1=rs0 slot=1
> >>> rank 2=rs1 slot=0
> >>> rank 3=rs1 slot=1
> >>> tyr openmpi_1.7.x_or_newer 152 
> >>> 
> >>> rs0 openmpi_1.7.x_or_newer 104 mpiexec --report-bindings 
> >>> --use-hwthread-cpus 
> > -np 
> >>> 4 -rf rf_rs0_rs1 hostname
> >>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not 
> >>> found 
> > in 
> >>> file 
> > ../../../../../openmpi-1.8.2rc3/orte/mca/rmaps/rank_file/rmaps_rank_file.c 
> >>> at line 279
> >>> [rs0.informatik.hs-fulda.de:26085] [[28578,0],0] ORTE_ERROR_LOG: Not 
> >>> found 
> > in 
> >>> file 
> >>> ../../../../openmpi-1.8.2rc3/orte/mca/rmaps/base/rmaps_base_map_job.c 
> > at 
> >>> line 285
> >>> rs0 openmpi_1.7.x_or_newer 105 
> >>> 
> >>> 
> >>> It works for the following command.
> >>> 
> >>> rs0 openmpi_1.7.x_or_newer 107 mpiexec --report-bindings -np 4 --host 
> > rs0,rs1 
> >>> --bind-to hwthread hostname
> >>> [rs0.informatik.hs-fulda.de:26102] MCW rank 0 bound to socket 0[core 
> >>> 0[hwt 
> > 0]]: 
> >>> [B./../../..][../../../..]
> >>> [rs0.informatik.hs-fulda.de:26102] MCW rank 1 bound to socket 1[core 
> >>> 4[hwt 
> > 0]]: 
> >>> [../../../..][B./../../..]
> >>> rs0.informatik.hs-fulda.de
> >>> rs0.informatik.hs-fulda.de
> >>> rs1.informatik.hs-fulda.de
> >>> [rs1.informatik.hs-fulda.de:28740] MCW rank 2 bound to socket 0[core 
> >>> 0[hwt 
> > 0]]: 
> >>> [B./../../..][../../../..]
> >>> [rs1.informatik.hs-fulda.de:28740] MCW rank 3 bound to socket 1[core 
> >>> 4[hwt 
> > 0]]: 
> >>> [../../../..][B./../../..]
> >>> rs1.informatik.hs-fulda.de
> >>> rs0 openmpi_1.7.x_or_newer 108 
> >>> 
> >>> 
> >>> I would be grateful if somebody could fix the problem. Please let
> >>> me know if I can provide anything else. Thank you very much for
> >>> any help in advance.
> >>> 
> >>> 
> >>> Kind regards
> >>> 
> >>> Siegmar
> >>> 
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>> Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/08/24907.php
> >> 
> >> 
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/users/2014/08/24936.php
> 

Reply via email to