I'm not sure of the issue, but so far as I'm aware the cpus-per-proc 
functionality continued to work thru all those releases and into today. Yes, 
the syntax changed during the 1.7 series to reflect a broader desire to 
consolidate options into something that could be contained in a minimum number 
of MCA parameters - but the original option was only deprecated and will still 
work (though we will emit a deprecation warning). Regardless, the 1.8.1 release 
should certainly understand the "pe=3" modifier and do the right thing.

The "processing element (pe)" terminology is one the general community is 
migrating towards as the use of hyperthreads grows. The old "slot" terminology 
simply wasn't accurate enough any more as a processing "slot" could contain 
multiple hardware threads (or even cores), especially if someone is allocating 
"containers". So we adopted the "pe" term as indicating the number of 
processors to be assigned to the process, with "processor" equating to either 
"core" or "hwthread" depending on whether or not you set the 
"use-hwthreads-as-cpus" flag.

The comments regarding the meaning of the term "rank" certainly aren't intended 
to be "snide" - they only reflect the fact that the "rank" or a process is only 
defined in terms of a given communicator. Thus, one process can have multiple 
"ranks" depending on (a) how many communicators have been created, and (b) what 
position it occupies within each of those. In general, we had been using the 
term only in relation to the initial comm_world communicator, but we 
unfortunately then started using the term in discussions over comm_spawn and 
other communicator creation functions - and generating confusion as to the 
process we were discussing.

We don't support cgroup directly, so if you are using cgroups, it is possible 
that we aren't picking up resource limits that cgroup might be setting. We 
*should* be seeing the core limits on the backend nodes, but I can't swear to 
it as we haven't (to my knowledge) tested against cgroups.


On May 15, 2014, at 11:16 AM, Mark Hahn <h...@mcmaster.ca> wrote:

>> We're open to suggestion, really - just need some help identifying the best
>> way to get this info out there.
> 
> well, OpenMPI information is fragmented and sprayed all over.
> In some places, there is mention of a wiki to be updated with an explanation; 
> for other things, a consumer needs to wander around loosely-related blogs, 
> mail archives, FAQs, usage statements, etc.
> 
> For instance, I've been trying to figure out how to do a simple thing,
> launch a hybrid job.  Assume I have a scheduled, heterogenous cluster
> where mpirun simply receives a normal nodefile like this:
> 
> clu357
> clu357
> clu357
> clu354
> clu354
> clu354
> 
> and I want to launch a 2-rank, 3-thread-per-rank job.  forget about frills 
> like hwloc or binding.
> 
> back when --cpus-per-proc was around, this was obvious and worked flawlessly. 
>  I honestly can't figure out how it works now, though - for any definition of 
> "now" since:
> 
> http://www.open-mpi.org/community/lists/devel/2011/12/10060.php
> 
> 2011!  then there's a dribble more info in 2014 (!) that hints that "--map-by 
> node:pe=3" might do the trick here:
> 
> http://comments.gmane.org/gmane.comp.clustering.open-mpi.user/21193
> 
> where did "pe" come from?  is it the same as slot, hwthread, core?
> why does the documentation make snide comments about how the conventional
> understanding of "rank" (~ equivalent to process) might not be true?
> 
> most of all, when was the break introduced?  at this point, I tell people
> that 1.4.3 worked, and that everything after that is broken.
> 
> recent releases (I tried 1.7.3, 1.7.5 and 1.8.1) choke on this. I wonder 
> whether it's having trouble with the fact that a job gets an arbitrary set of 
> cores via cgroup, and perhaps hwloc doesn't understand that it can only work 
> within this set...
> 
> 
>>>>   So please see this URL below(especially the first half part
>>>>   of it - from 1 to 20 pages):
>>>>   
>>>> http://www.slideshare.net/jsquyres/open-mpi-explorations-in-process-affinity-eurompi13-presentation
>>>> 
>>>>   Although these slides by Jeff are the explanation for LAMA,
>>>>   which is another mapping system installed in the openmpi-1.7
>>>>   series, I guess you can easily understand what is mapping and
>>>>   binding in general terms.
> 
> AFAIKT, the lama slide deck seemed to be only concerned with affinity 
> settings, which are irrelevant here.
> 
> confused,
> Mark Hahn.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to