Hmmm...you shouldn't need to specify a hostfile in addition to the rankfile, so something has gotten messed up in the allocator. I'll take a look at it.
As for cpus-per-proc, I'm hoping to tackle it over the holiday while I take a break from my regular job. Will let you know when fixed. Thanks for your patience! On Dec 15, 2012, at 1:41 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi Ralph > >>> some weeks ago (mainly in the beginning of October) I reported >>> several problems and I would be grateful if you can tell me if >>> and probably when somebody will try to solve them. >>> >>> 1) I don't get the expected results, when I try to send or scatter >>> the columns of a matrix in Java. The received column values have >>> nothing to do with the original values, if I use a homogeneous >>> environment and the program breaks with "An error occurred in >>> MPI_Comm_dup" and "MPI_ERR_INTERN: internal error", if I use >>> a heterogeneous environment. I would like to use the Java API. >>> >>> 2) I don't get the expected result, when I try to scatter an object >>> in Java. >>> https://svn.open-mpi.org/trac/ompi/ticket/3351 >> >> Nothing has happened on these yet > > Do you have an idea when somebody will have time to fix these problems? > > >>> 3) I still get only a message that all nodes are already filled up >>> when I use a "rankfile" and nothing else happens. I would like >>> to use a rankfile. You filed a bug fix for it. >>> >> >> I believe rankfile was fixed, at least on the trunk - not sure if it >> was moved to 1.7. I assume that's the release you are talking about? > > I'm using the trunk for my tests. It didn't work for me because I used > the rankfile without a hostfile or a hostlist (it is not enough to > specify the hosts in the rankfile). Everything works fine when I provide > a "correct" hostfile or hostlist and the binding isn't too compilicated > (see my last example below). > > My rankfile: > > rank 0=sunpc0 slot=0:0 > rank 1=sunpc1 slot=0:0 > rank 2=sunpc0 slot=1:0 > rank 3=sunpc1 slot=1:0 > > > My hostfile: > > sunpc0 slots=4 > sunpc1 slots=4 > > > It will not work without a hostfile or hostlist. > > sunpc0 mpi-probleme 128 mpiexec -report-bindings -rf rankfile_1.openmpi \ > -np 4 hostname > ------------------------------------------------------------------------ > The rankfile that was used claimed that a host was either not > allocated or oversubscribed its slots. Please review your rank-slot > assignments and your host allocation to ensure a proper match. Also, > some systems may require using full hostnames, such as > "host1.example.com" (instead of just plain "host1"). > > Host: sunpc1 > ------------------------------------------------------------------------ > sunpc0 mpi-probleme 129 > > > I get the expected output, if I add "-hostfile host_sunpc" or > "-host sunpc0,sunpc1" on the command line. > > sunpc0 mpi-probleme 129 mpiexec -report-bindings -rf rankfile_1.openmpi \ > -np 4 -hostfile host_sunpc hostname > [sunpc0:06954] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.] > [sunpc0:06954] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] > sunpc0 > sunpc0 > [sunpc1:12583] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/.][./.] > [sunpc1:12583] MCW rank 3 bound to socket 1[core 2[hwt 0]]: [./.][B/.] > sunpc1 > sunpc1 > sunpc0 mpi-probleme 130 > > > Furthermore it is necessary that both the rankfile and the hostfile > contain qualified or unqualified hostnames in the same way. Otherwise > it will not work as you can see in the following output where my > hostfile contains a qualified hostname and my rankfile only the hostname > without domain name. > > sunpc0 mpi-probleme 131 mpiexec -report-bindings -rf rankfile_1.openmpi \ > -np 4 -hostfile host_sunpc_full hostname > ------------------------------------------------------------------------ > The rankfile that was used claimed that a host was either not > allocated or oversubscribed its slots. Please review your rank-slot > assignments and your host allocation to ensure a proper match. Also, > some systems may require using full hostnames, such as > "host1.example.com" (instead of just plain "host1"). > > Host: sunpc1 > ------------------------------------------------------------------------ > sunpc0 mpi-probleme 132 > > > Unfortunately my complicated rankfile still doesn't work, although > you told me some weeks ago that it is correct. > > rank 0=sunpc0 slot=0:0-1,1:0-1 > rank 1=sunpc1 slot=0:0-1 > rank 2=sunpc1 slot=1:0 > rank 3=sunpc1 slot=1:1 > > sunpc1 mpi-probleme 103 mpiexec -report-bindings -rf rankfile -np 4 \ > -hostfile host_sunpc hostname > sunpc1 > sunpc1 > sunpc1 > [sunpc1:12741] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.] > [sunpc1:12741] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B] > [sunpc1:12741] MCW rank 1 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]]: [B/B][./.] > [sunpc0:07075] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket 0[core 1[hwt 0]]: [B/B][./.] > sunpc0 > sunpc1 mpi-probleme 104 > > The bindings for ranks 1 to 3 are correct, but rank 0 didn't get the > cores from the second socket. > > > >>> 4) I would like to have "-cpus-per-proc", "-npersocket", etc for >>> every set of machines/applications and not globally for all >>> machines/applications if I specify several colon-separated sets >>> of machines or applications on the command line. You told me that >>> it could be done. >>> >>> 5) By the way, it seems that the option "-cpus-per-proc" isn't any >>> longer supported in openmpi-1.7 and openmpi-1.9. How can I bind a >>> multi-threaded process to more than one core in these versions? >> >> I'm afraid I haven't gotten around to working on cpus-per-proc, though >> I believe npersocket was fixed. > > Will you also support "-cpus-per-proc" in openmpi-1.7 and openmpi-1.9? > At the moment it isn't available. > > sunpc1 mpi-probleme 106 mpiexec -report-bindings -np 4 \ > -host linpc0,linpc1,sunpc0,sunpc1 -cpus-per-proc 4 -map-by core \ > -bind-to core hostname > mpiexec: Error: unknown option "-p" > Type 'mpiexec --help' for usage. > > > sunpc1 mpi-probleme 110 mpiexec --help | grep cpus > cpus allocated to this job [default: none] > -use-hwthread-cpus|--use-hwthread-cpus > Use hardware threads as independent cpus > > > >>> I can provide my small programs once more if you need them. Thank >>> you very much for any answer in advance. > > Thank you very much for all your help and time > > Siegmar > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users