Hi,

I'm new to rankfiles so that I played a little bit with different
options. I thought that the following entry would be similar to an
entry in an appfile and that MPI could place the process with rank 0
on any core of any processor.

rank 0=tyr.informatik.hs-fulda.de

Unfortunately it's not allowed and I got an error. Can somebody add
the missing help to the file?


tyr small_prog 126 mpiexec -rf my_rankfile -report-bindings rank_size
--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    no-slot-list
from the file:
    help-rmaps_rank_file.txt
But I couldn't find that topic in the file.  Sorry!
--------------------------------------------------------------------------


As you can see below I could use a rankfile on my old local machine
(Sun Ultra 45) but not on our "new" one (Sun Server M4000). Today I
logged into the machine via ssh and tried the same command once more
as a local user without success. It's more or less the same error as
before when I tried to bind the process to a remote machine.

rs0 small_prog 118 mpiexec -rf my_rankfile -report-bindings rank_size
[rs0.informatik.hs-fulda.de:13745] [[19734,0],0] odls:default:fork
  binding child [[19734,1],0] to slot_list 0:0
--------------------------------------------------------------------------
We were unable to successfully process/set the requested processor
affinity settings:

Specified slot list: 0:0
Error: Cross-device link

This could mean that a non-existent processor was specified, or
that the specification had improper syntax.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec was unable to start the specified application as it encountered an 
error:

Error name: No such file or directory
Node: rs0.informatik.hs-fulda.de

when attempting to start process rank 0.
--------------------------------------------------------------------------
rs0 small_prog 119 


The application is available.

rs0 small_prog 119 which rank_size
/home/fd1026/SunOS/sparc/bin/rank_size


Is it a problem in the Open MPI implementation or in my rankfile?
How can I request which sockets and cores per socket are
available so that I can use correct values in my rankfile?
In lam-mpi I had a command "lamnodes" which I could use to get
such information. Thank you very much for any help in advance.


Kind regards

Siegmar



> > Are *all* the machines Sparc? Or just the 3rd one (rs0)?
> 
> Yes, both machines are Sparc. I tried first in a homogeneous
> environment.
> 
> tyr fd1026 106 psrinfo -v
> Status of virtual processor 0 as of: 09/04/2012 07:32:14
>   on-line since 08/31/2012 15:44:42.
>   The sparcv9 processor operates at 1600 MHz,
>         and has a sparcv9 floating point processor.
> Status of virtual processor 1 as of: 09/04/2012 07:32:14
>   on-line since 08/31/2012 15:44:39.
>   The sparcv9 processor operates at 1600 MHz,
>         and has a sparcv9 floating point processor.
> tyr fd1026 107 
> 
> My local machine (tyr) is a dual processor machine and the
> other one is equipped with two quad-core processors each
> capable of running two hardware threads.
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> > On Sep 3, 2012, at 12:43 PM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> > 
> > > Hi,
> > > 
> > > the man page for "mpiexec" shows the following:
> > > 
> > >         cat myrankfile
> > >         rank 0=aa slot=1:0-2
> > >         rank 1=bb slot=0:0,1
> > >         rank 2=cc slot=1-2
> > >         mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that
> > > 
> > >       Rank 0 runs on node aa, bound to socket 1, cores 0-2.
> > >       Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
> > >       Rank 2 runs on node cc, bound to cores 1 and 2.
> > > 
> > > Does it mean that the process with rank 0 should be bound to
> > > core 0, 1, or 2 of socket 1?
> > > 
> > > I tried to use a rankfile and have a problem. My rankfile contains
> > > the following lines.
> > > 
> > > rank 0=tyr.informatik.hs-fulda.de slot=0:0
> > > rank 1=tyr.informatik.hs-fulda.de slot=1:0
> > > #rank 2=rs0.informatik.hs-fulda.de slot=0:0
> > > 
> > > 
> > > Everything is fine if I use the file with just my local machine
> > > (the first two lines).
> > > 
> > > tyr small_prog 115 mpiexec -report-bindings -rf my_rankfile rank_size
> > > [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
> > >  odls:default:fork binding child [[9849,1],0] to slot_list 0:0
> > > [tyr.informatik.hs-fulda.de:01133] [[9849,0],0]
> > >  odls:default:fork binding child [[9849,1],1] to slot_list 1:0
> > > I'm process 0 of 2 available processes running on 
> tyr.informatik.hs-fulda.de.
> > > MPI standard 2.1 is supported.
> > > I'm process 1 of 2 available processes running on 
> tyr.informatik.hs-fulda.de.
> > > MPI standard 2.1 is supported.
> > > tyr small_prog 116 
> > > 
> > > 
> > > I can also change the socket number and the processes will be attached
> > > to the correct cores. Unfortunately it doesn't work if I add one
> > > other machine (third line).
> > > 
> > > 
> > > tyr small_prog 112 mpiexec -report-bindings -rf my_rankfile rank_size
> > > --------------------------------------------------------------------------
> > > We were unable to successfully process/set the requested processor
> > > affinity settings:
> > > 
> > > Specified slot list: 0:0
> > > Error: Cross-device link
> > > 
> > > This could mean that a non-existent processor was specified, or
> > > that the specification had improper syntax.
> > > --------------------------------------------------------------------------
> > > [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
> > >  odls:default:fork binding child [[10212,1],0] to slot_list 0:0
> > > [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
> > >  odls:default:fork binding child [[10212,1],1] to slot_list 1:0
> > > [rs0.informatik.hs-fulda.de:12047] [[10212,0],1]
> > >  odls:default:fork binding child [[10212,1],2] to slot_list 0:0
> > > [tyr.informatik.hs-fulda.de:01520] [[10212,0],0]
> > >  ORTE_ERROR_LOG: A message is attempting to be sent to a process
> > >  whose contact information is unknown in file
> > >  ../../../../../openmpi-1.6/orte/mca/rml/oob/rml_oob_send.c at line 145
> > > [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] attempted to send
> > >  to [[10212,1],0]: tag 20
> > > [tyr.informatik.hs-fulda.de:01520] [[10212,0],0] ORTE_ERROR_LOG:
> > >  A message is attempting to be sent to a process whose contact
> > >  information is unknown in file
> > >  ../../../../openmpi-1.6/orte/mca/odls/base/odls_base_default_fns.c
> > >  at line 2501
> > > --------------------------------------------------------------------------
> > > mpiexec was unable to start the specified application as it
> > >  encountered an error:
> > > 
> > > Error name: Error 0
> > > Node: rs0.informatik.hs-fulda.de
> > > 
> > > when attempting to start process rank 2.
> > > --------------------------------------------------------------------------
> > > tyr small_prog 113 
> > > 
> > > 
> > > 
> > > The other machine has two 8 core processors.
> > > 
> > > tyr small_prog 121 ssh rs0 psrinfo -v
> > > Status of virtual processor 0 as of: 09/03/2012 19:51:15
> > >  on-line since 07/26/2012 15:03:14.
> > >  The sparcv9 processor operates at 2400 MHz,
> > >        and has a sparcv9 floating point processor.
> > > Status of virtual processor 1 as of: 09/03/2012 19:51:15
> > > ...
> > > Status of virtual processor 15 as of: 09/03/2012 19:51:15
> > >  on-line since 07/26/2012 15:03:16.
> > >  The sparcv9 processor operates at 2400 MHz,
> > >        and has a sparcv9 floating point processor.
> > > tyr small_prog 122 
> > > 
> > > 
> > > 
> > > Is it necessary to specify another option on the command line or
> > > is my rankfile faulty? Thank you very much for any suggestions in
> > > advance.
> > > 
> > > 
> > > Kind regards
> > > 
> > > Siegmar
> > > 
> > > 
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to