Aha - I'm able to replicate it, will fix. On Jan 29, 2013, at 11:57 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Using an svn checkout of the current 1.6 branch, if works fine for me: > > [rhc@odin ~/v1.6]$ cat rf > rank 0=odin127 slot=0:0-1,1:0-1 > rank 1=odin128 slot=1 > > [rhc@odin ~/v1.6]$ mpirun -n 2 -rf ./rf --report-bindings hostname > [odin127.cs.indiana.edu:12078] MCW rank 0 bound to socket 0[core 0-1] socket > 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) > [odin128.cs.indiana.edu:12156] MCW rank 1 bound to socket 0[core 1]: [. B][. > .] (slot list 1) > odin127.cs.indiana.edu > odin128.cs.indiana.edu > > Note that those two nodes were indeed allocated by Slurm - are you using a > resource manager? Or is the allocation being defined by the rankfile? > > If the latter, please add --display-allocation to your cmd line and let's see > what it thinks was allocated. Also, if you configure OMPI --enable-debug, you > could add "-mca ras_base_verbose 5" to the cmd line and get further > diagnostic output > > > On Jan 29, 2013, at 10:54 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> Hi >> >> today I have installed openmpi-1.6.4rc3r27923. Unfortunately I >> still have a problem with rankfiles, if I start a process on a >> remote machine. >> >> >> tyr rankfiles 114 ssh linpc1 ompi_info | grep "Open MPI:" >> Open MPI: 1.6.4rc3r27923 >> >> tyr rankfiles 115 cat rf_linpc1 >> rank 0=linpc1 slot=0:0-1,1:0-1 >> >> tyr rankfiles 116 mpiexec -report-bindings -np 1 \ >> -rf rf_linpc1 hostname >> ------------------------------------------------------------------ >> All nodes which are allocated for this job are already filled. >> ------------------------------------------------------------------ >> >> >> The following command still works. >> >> tyr rankfiles 119 mpiexec -report-bindings -np 1 -host linpc1 \ >> -cpus-per-proc 4 -bycore -bind-to-core hostname >> [linpc1:32262] MCW rank 0 bound to socket 0[core 0-1] >> socket 1[core 0-1]: [B B][B B] >> linpc1 >> tyr rankfiles 120 >> >> >> Everything is fine, if I use the rankfile on the local machine. >> >> linpc1 rankfiles 103 ompi_info | grep "Open MPI:" >> >> Open MPI: 1.6.4rc3r27923 >> >> linpc1 rankfiles 104 cat rf_linpc1 >> rank 0=linpc1 slot=0:0-1,1:0-1 >> >> linpc1 rankfiles 105 mpiexec -report-bindings -np 1 \ >> -rf rf_linpc1 hostname >> [linpc1:32385] MCW rank 0 bound to socket 0[core 0-1] >> socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) >> linpc1 >> linpc1 rankfiles 106 >> >> >> In my opinion it should also work if I start a process on a >> remote machine. Can somebody look once more into this issue? >> Thank you very much for your help in advance. >> >> >> Kind regards >> >> Siegmar >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >