Hmmm...you shouldn't need to specify a hostfile in addition to the rankfile, so 
something has gotten messed up in the allocator. I'll take a look at it.

As for cpus-per-proc, I'm hoping to tackle it over the holiday while I take a 
break from my regular job. Will let you know when fixed.

Thanks for your patience!


On Dec 15, 2012, at 1:41 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi Ralph
> 
>>> some weeks ago (mainly in the beginning of October) I reported
>>> several problems and I would be grateful if you can tell me if
>>> and probably when somebody will try to solve them.
>>> 
>>> 1) I don't get the expected results, when I try to send or scatter
>>>  the columns of a matrix in Java. The received column values have
>>>  nothing to do with the original values, if I use a homogeneous
>>>  environment and the program breaks with "An error occurred in
>>>  MPI_Comm_dup" and "MPI_ERR_INTERN: internal error", if I use
>>>  a heterogeneous environment. I would like to use the Java API.
>>> 
>>> 2) I don't get the expected result, when I try to scatter an object
>>>  in Java.
>>>  https://svn.open-mpi.org/trac/ompi/ticket/3351
>> 
>> Nothing has happened on these yet
> 
> Do you have an idea when somebody will have time to fix these problems?
> 
> 
>>> 3) I still get only a message that all nodes are already filled up
>>>  when I use a "rankfile" and nothing else happens. I would like
>>>  to use a rankfile. You filed a bug fix for it.
>>> 
>> 
>> I believe rankfile was fixed, at least on the trunk - not sure if it
>> was moved to 1.7. I assume that's the release you are talking about?
> 
> I'm using the trunk for my tests. It didn't work for me because I used
> the rankfile without a hostfile or a hostlist (it is not enough to
> specify the hosts in the rankfile). Everything works fine when I provide
> a "correct" hostfile or hostlist and the binding isn't too compilicated
> (see my last example below).
> 
> My rankfile:
> 
> rank 0=sunpc0 slot=0:0
> rank 1=sunpc1 slot=0:0
> rank 2=sunpc0 slot=1:0
> rank 3=sunpc1 slot=1:0
> 
> 
> My hostfile:
> 
> sunpc0 slots=4
> sunpc1 slots=4
> 
> 
> It will not work without a hostfile or hostlist.
> 
> sunpc0 mpi-probleme 128 mpiexec -report-bindings -rf rankfile_1.openmpi \
>  -np 4 hostname
> ------------------------------------------------------------------------
> The rankfile that was used claimed that a host was either not
> allocated or oversubscribed its slots.  Please review your rank-slot
> assignments and your host allocation to ensure a proper match.  Also,
> some systems may require using full hostnames, such as
> "host1.example.com" (instead of just plain "host1").
> 
>  Host: sunpc1
> ------------------------------------------------------------------------
> sunpc0 mpi-probleme 129 
> 
> 
> I get the expected output, if I add "-hostfile host_sunpc" or
> "-host sunpc0,sunpc1" on the command line.
> 
> sunpc0 mpi-probleme 129 mpiexec -report-bindings -rf rankfile_1.openmpi \
>  -np 4 -hostfile host_sunpc hostname
> [sunpc0:06954] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.]
> [sunpc0:06954] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> sunpc0
> sunpc0
> [sunpc1:12583] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/.][./.]
> [sunpc1:12583] MCW rank 3 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> sunpc1
> sunpc1
> sunpc0 mpi-probleme 130 
> 
> 
> Furthermore it is necessary that both the rankfile and the hostfile
> contain qualified or unqualified hostnames in the same way. Otherwise
> it will not work as you can see in the following output where my
> hostfile contains a qualified hostname and my rankfile only the hostname
> without domain name.
> 
> sunpc0 mpi-probleme 131 mpiexec -report-bindings -rf rankfile_1.openmpi \
>  -np 4 -hostfile host_sunpc_full hostname
> ------------------------------------------------------------------------
> The rankfile that was used claimed that a host was either not
> allocated or oversubscribed its slots.  Please review your rank-slot
> assignments and your host allocation to ensure a proper match.  Also,
> some systems may require using full hostnames, such as
> "host1.example.com" (instead of just plain "host1").
> 
>  Host: sunpc1
> ------------------------------------------------------------------------
> sunpc0 mpi-probleme 132 
> 
> 
> Unfortunately my complicated rankfile still doesn't work, although
> you told me some weeks ago that it is correct.
> 
> rank 0=sunpc0 slot=0:0-1,1:0-1
> rank 1=sunpc1 slot=0:0-1
> rank 2=sunpc1 slot=1:0
> rank 3=sunpc1 slot=1:1
> 
> sunpc1 mpi-probleme 103 mpiexec -report-bindings -rf rankfile -np 4 \
>  -hostfile host_sunpc hostname
> sunpc1
> sunpc1
> sunpc1
> [sunpc1:12741] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.]
> [sunpc1:12741] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B]
> [sunpc1:12741] MCW rank 1 bound to socket 0[core 0[hwt 0]],
>   socket 0[core 1[hwt 0]]: [B/B][./.]
> [sunpc0:07075] MCW rank 0 bound to socket 0[core 0[hwt 0]],
>   socket 0[core 1[hwt 0]]: [B/B][./.]
> sunpc0
> sunpc1 mpi-probleme 104 
> 
> The bindings for ranks 1 to 3 are correct, but rank 0 didn't get the
> cores from the second socket.
> 
> 
> 
>>> 4) I would like to have "-cpus-per-proc", "-npersocket", etc for
>>>  every set of machines/applications and not globally for all
>>>  machines/applications if I specify several colon-separated sets
>>>  of machines or applications on the command line. You told me that
>>>  it could be done.
>>> 
>>> 5) By the way, it seems that the option "-cpus-per-proc" isn't any
>>>  longer supported in openmpi-1.7 and openmpi-1.9. How can I bind a
>>>  multi-threaded process to more than one core in these versions?
>> 
>> I'm afraid I haven't gotten around to working on cpus-per-proc, though
>> I believe npersocket was fixed.
> 
> Will you also support "-cpus-per-proc" in openmpi-1.7 and openmpi-1.9?
> At the moment it isn't available.
> 
> sunpc1 mpi-probleme 106 mpiexec -report-bindings -np 4 \
>  -host linpc0,linpc1,sunpc0,sunpc1 -cpus-per-proc 4 -map-by core \
>  -bind-to core hostname
> mpiexec: Error: unknown option "-p"
> Type 'mpiexec --help' for usage.
> 
> 
> sunpc1 mpi-probleme 110 mpiexec --help | grep cpus
>                         cpus allocated to this job [default: none]
>   -use-hwthread-cpus|--use-hwthread-cpus 
>                         Use hardware threads as independent cpus
> 
> 
> 
>>> I can provide my small programs once more if you need them. Thank
>>> you very much for any answer in advance.
> 
> Thank you very much for all your help and time
> 
> Siegmar
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to