On Aug 20, 2014, at 11:16 AM, Reuti <re...@staff.uni-marburg.de> wrote:

> Am 20.08.2014 um 19:05 schrieb Ralph Castain:
> 
>>> <snip>
>>> Aha, this is quite interesting - how do you do this: scanning the 
>>> /proc/<pid>/status or alike? What happens if you don't find enough free 
>>> cores as they are used up by other applications already?
>>> 
>> 
>> Remember, when you use mpirun to launch, we launch our own daemons using the 
>> native launcher (e.g., qsub). So the external RM will bind our daemons to 
>> the specified cores on each node. We use hwloc to determine what cores our 
>> daemons are bound to, and then bind our own child processes to cores within 
>> that range.
> 
> Thx for reminding me of this. Indeed, I mixed up two different aspects in 
> this discussion.
> 
> a) What will happen in case no binding was done by the RM (hence Open MPI 
> could use all cores) and two Open MPI jobs (or something completely different 
> besides one Open MPI job) are running on the same node (due to the Tight 
> Integration with two different Open MPI directories in /tmp and two `orted`, 
> unique for each job)? Will the second Open MPI job know what the first Open 
> MPI job used up already? Or will both use the same set of cores as "-bind-to 
> none" can't be set in the given `mpiexec` command because of "-map-by 
> slot:pe=$OMP_NUM_THREADS" was used - which triggers "-bind-to core" 
> indispensable and can't be switched off? I see the same cores being used for 
> both jobs.

Yeah, each mpirun executes completely independently of the other, so they have 
no idea what the other is doing. So the cores will be overloaded. Multi-pe's 
requires bind-to-core otherwise there is no way to implement the request

> 
> Altering the machinefile instead: the processes are not bound to any core, 
> and the OS takes care of a proper assignment.
> 
> 
>> If the cores we are bound to are the same on each node, then we will do this 
>> with no further instruction. However, if the cores are different on the 
>> individual nodes, then you need to add --hetero-nodes to your command line 
>> (as the nodes appear to be heterogeneous to us).
> 
> b) Aha, it's not about different type CPU types, but also same CPU type but 
> different allocations between the nodes? It's not in the `mpiexec` man-page 
> of 1.8.1 though. I'll have a look at it.

The man page is probably a little out-of-date in this area - but yes, 
--hetero-nodes is required for *any* difference in the way the nodes appear to 
us (cpus, slot assignments, etc.). The 1.9 series may remove that requirement - 
still looking at it.

> 
> 
>> So it is up to the RM to set the constraint - we just live within it.
> 
> Fine.
> 
> -- Reuti
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25097.php

Reply via email to