Using an svn checkout of the current 1.6 branch, if works fine for me:

[rhc@odin ~/v1.6]$ cat rf
rank 0=odin127 slot=0:0-1,1:0-1
rank 1=odin128 slot=1

[rhc@odin ~/v1.6]$ mpirun -n 2 -rf ./rf --report-bindings hostname
[odin127.cs.indiana.edu:12078] MCW rank 0 bound to socket 0[core 0-1] socket 
1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
[odin128.cs.indiana.edu:12156] MCW rank 1 bound to socket 0[core 1]: [. B][. .] 
(slot list 1)
odin127.cs.indiana.edu
odin128.cs.indiana.edu

Note that those two nodes were indeed allocated by Slurm - are you using a 
resource manager? Or is the allocation being defined by the rankfile?

If the latter, please add --display-allocation to your cmd line and let's see 
what it thinks was allocated. Also, if you configure OMPI --enable-debug, you 
could add "-mca ras_base_verbose 5" to the cmd line and get further diagnostic 
output


On Jan 29, 2013, at 10:54 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi
> 
> today I have installed openmpi-1.6.4rc3r27923. Unfortunately I
> still have a problem with rankfiles, if I start a process on a
> remote machine.
> 
> 
> tyr rankfiles 114  ssh linpc1 ompi_info | grep "Open MPI:"
>                Open MPI: 1.6.4rc3r27923
> 
> tyr rankfiles 115 cat rf_linpc1
> rank 0=linpc1 slot=0:0-1,1:0-1
> 
> tyr rankfiles 116 mpiexec -report-bindings -np 1 \
>  -rf rf_linpc1 hostname
> ------------------------------------------------------------------
> All nodes which are allocated for this job are already filled.
> ------------------------------------------------------------------
> 
> 
> The following command still works.
> 
> tyr rankfiles 119 mpiexec -report-bindings -np 1 -host linpc1 \
>  -cpus-per-proc 4 -bycore -bind-to-core hostname
> [linpc1:32262] MCW rank 0 bound to socket 0[core 0-1]
>  socket 1[core 0-1]: [B B][B B]
> linpc1
> tyr rankfiles 120 
> 
> 
> Everything is fine, if I use the rankfile on the local machine.
> 
> linpc1 rankfiles 103 ompi_info | grep "Open MPI:"                             
>                  
> Open MPI: 1.6.4rc3r27923
> 
> linpc1 rankfiles 104 cat rf_linpc1
> rank 0=linpc1 slot=0:0-1,1:0-1
> 
> linpc1 rankfiles 105 mpiexec -report-bindings -np 1 \
>  -rf rf_linpc1 hostname
> [linpc1:32385] MCW rank 0 bound to socket 0[core 0-1]
>  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
> linpc1
> linpc1 rankfiles 106
> 
> 
> In my opinion it should also work if I start a process on a
> remote machine. Can somebody look once more into this issue?
> Thank you very much for your help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to