Are *all* the machines Sparc? Or just the 3rd one (rs0)?

On Sep 3, 2012, at 12:43 PM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> the man page for "mpiexec" shows the following:
> 
>         cat myrankfile
>         rank 0=aa slot=1:0-2
>         rank 1=bb slot=0:0,1
>         rank 2=cc slot=1-2
>         mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that
> 
>       Rank 0 runs on node aa, bound to socket 1, cores 0-2.
>       Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
>       Rank 2 runs on node cc, bound to cores 1 and 2.
> 
> Does it mean that the process with rank 0 should be bound to
> core 0, 1, or 2 of socket 1?
> 
> I tried to use a rankfile and have a problem. My rankfile contains
> the following lines.
> 
> rank 0=tyr.informatik.hs-fulda.de slot=0:0
> rank 1=tyr.informatik.hs-fulda.de slot=1:0
> #rank 2=rs0.informatik.hs-fulda.de slot=0:0
> 
> 
> Everything is fine if I use the file with just my local machine
> (the first two lines).
> 
> tyr small_prog 115 mpiexec -report-bindings -rf my_rankfile rank_size
> [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
>  odls:default:fork binding child [[9849,1],0] to slot_list 0:0
> [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
>  odls:default:fork binding child [[9849,1],1] to slot_list 1:0
> I'm process 0 of 2 available processes running on tyr.informatik.hs-fulda.de.
> MPI standard 2.1 is supported.
> I'm process 1 of 2 available processes running on tyr.informatik.hs-fulda.de.
> MPI standard 2.1 is supported.
> tyr small_prog 116 
> 
> 
> I can also change the socket number and the processes will be attached
> to the correct cores. Unfortunately it doesn't work if I add one
> other machine (third line).
> 
> 
> tyr small_prog 112 mpiexec -report-bindings -rf my_rankfile rank_size
> --------------------------------------------------------------------------
> We were unable to successfully process/set the requested processor
> affinity settings:
> 
> Specified slot list: 0:0
> Error: Cross-device link
> 
> This could mean that a non-existent processor was specified, or
> that the specification had improper syntax.
> --------------------------------------------------------------------------
> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
>  odls:default:fork binding child [[10212,1],0] to slot_list 0:0
> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
>  odls:default:fork binding child [[10212,1],1] to slot_list 1:0
> [rs0.informatik.hs-fulda.de:12047] [[10212,0],1]
>  odls:default:fork binding child [[10212,1],2] to slot_list 0:0
> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
>  ORTE_ERROR_LOG: A message is attempting to be sent to a process
>  whose contact information is unknown in file
>  ../../../../../openmpi-1.6/orte/mca/rml/oob/rml_oob_send.c at line 145
> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] attempted to send
>  to [[10212,1],0]: tag 20
> [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] ORTE_ERROR_LOG:
>  A message is attempting to be sent to a process whose contact
>  information is unknown in file
>  ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
>  at line 2501
> --------------------------------------------------------------------------
> mpiexec was unable to start the specified application as it
>  encountered an error:
> 
> Error name: Error 0
> Node: rs0.informatik.hs-fulda.de
> 
> when attempting to start process rank 2.
> --------------------------------------------------------------------------
> tyr small_prog 113 
> 
> 
> 
> The other machine has two 8 core processors.
> 
> tyr small_prog 121 ssh rs0 psrinfo -v
> Status of virtual processor 0 as of: 09/03/2012 19:51:15
>  on-line since 07/26/2012 15:03:14.
>  The sparcv9 processor operates at 2400 MHz,
>        and has a sparcv9 floating point processor.
> Status of virtual processor 1 as of: 09/03/2012 19:51:15
> ...
> Status of virtual processor 15 as of: 09/03/2012 19:51:15
>  on-line since 07/26/2012 15:03:16.
>  The sparcv9 processor operates at 2400 MHz,
>        and has a sparcv9 floating point processor.
> tyr small_prog 122 
> 
> 
> 
> Is it necessary to specify another option on the command line or
> is my rankfile faulty? Thank you very much for any suggestions in
> advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to