Am 20.08.2014 um 19:05 schrieb Ralph Castain:

>> <snip>
>> Aha, this is quite interesting - how do you do this: scanning the 
>> /proc/<pid>/status or alike? What happens if you don't find enough free 
>> cores as they are used up by other applications already?
>> 
> 
> Remember, when you use mpirun to launch, we launch our own daemons using the 
> native launcher (e.g., qsub). So the external RM will bind our daemons to the 
> specified cores on each node. We use hwloc to determine what cores our 
> daemons are bound to, and then bind our own child processes to cores within 
> that range.

Thx for reminding me of this. Indeed, I mixed up two different aspects in this 
discussion.

a) What will happen in case no binding was done by the RM (hence Open MPI could 
use all cores) and two Open MPI jobs (or something completely different besides 
one Open MPI job) are running on the same node (due to the Tight Integration 
with two different Open MPI directories in /tmp and two `orted`, unique for 
each job)? Will the second Open MPI job know what the first Open MPI job used 
up already? Or will both use the same set of cores as "-bind-to none" can't be 
set in the given `mpiexec` command because of "-map-by 
slot:pe=$OMP_NUM_THREADS" was used - which triggers "-bind-to core" 
indispensable and can't be switched off? I see the same cores being used for 
both jobs.

Altering the machinefile instead: the processes are not bound to any core, and 
the OS takes care of a proper assignment.


> If the cores we are bound to are the same on each node, then we will do this 
> with no further instruction. However, if the cores are different on the 
> individual nodes, then you need to add --hetero-nodes to your command line 
> (as the nodes appear to be heterogeneous to us).

b) Aha, it's not about different type CPU types, but also same CPU type but 
different allocations between the nodes? It's not in the `mpiexec` man-page of 
1.8.1 though. I'll have a look at it.


> So it is up to the RM to set the constraint - we just live within it.

Fine.

-- Reuti

Reply via email to