Thanks again Reuti. A combination of -q to choose the nodes, np such
that all cores on those nodes are used, and using a re-ordered
machinefile in the job script did what I want.
On 08/25/2014 03:01 AM, Reuti wrote:
Am 23.08.2014 um 16:46 schrieb Noah Knowles:
Hi Reuti,
On 08/23/2014 01:38 AM, Reuti wrote:
Am 23.08.2014 um 02:37 schrieb Reuti:
Hi,
Am 23.08.2014 um 00:43 schrieb Noah Knowles:
Hi, I am using OGS/GE 2011.11p1 on ROCKS. We have a small cluster with a
combination of 12- and 16-core blades. We are running an application where the
specific assignment of ranks to nodes has a big effect on run time. Is it
possible, for example, with NP=64 to specify that
ranks 0-15 go to a 16-core blade,
ranks 16-27 go to a 12-core blade,
ranks 28-39 go to a 12-core blade,
ranks 40-55 go to a 16-core blade, and
ranks 56-63 go to a 12-core blade?
I tried, for this example,
qsub -binding linear:64 -l
h="compute-0-4|compute-0-0|compute-0-1|compute-0-5|compute-0-2"
The binding would only be honored (as it's a soft request), if there would be a node with
64 cores. And it must also be activated in "execd_params" in SGE's
configuration.
OK I see. I misunderstood the way that binding works.
(where compute nodes 4-5 are 16 core and the others are 12-core), but that gave
me no control over the order in which the nodes were assigned.
We are experimenting with Intel MPI and OpenMPI-- I couldn't figure out how to
do this with the Intel mpirun options, and rankfiles were causing errors, so I
was hoping to accomplish it with qsub.
- Do you have a tight integration of Open MPI into SGE (i.e. compiled with
"--with-sge")?
yes
- All 64 are MPI processes, no OpenMP threads?
correct
- What PE did you use?
orte
- You always want complete machines, i.e. you could also request 68 cores?
yes that would be smarter!
- The rank0 (i.e. where also the jobscript runs) can be selected with:
`qsub -masterq foobar@compute-0-4 ...`
- Additional machines with:
"... -q
foobar@compute-0-4,foobar@compute-0-0,foobar@compute-0-1,foobar@compute-0-5,foobar@compute-0-2"
(foobar@compute-0-4 needs to be listed in both options, no order of hosts
guaranteed)
Creating a rankfile out of the granted machinefile should work (i.e. keeping
the allocation). As long as you are alone on these machine, it's better when
Open MPI would do the binding to cores finally.
Jobscript:
# Reorder in the way you need them
sort $PE_HOSTFILE > RESORTED_HOSTFILE
export PE_HOSTFILE=RESORTED_HOSTFILE
PeHostfile2RankFile()
{
rank=0
cat RESORTED_HOSTFILE | while read line; do
# echo $line
host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
nslots=`echo $line|cut -f2 -d" "`
i=0
while [ $i -lt $nslots ]; do
echo "rank $rank=$host slot=$i"
rank=`expr $rank + 1`
i=`expr $i + 1`
if [ $rank -eq "$1" ]; then
break
fi
done
done
}
PeHostfile2RankFile 64 > RANKFILE
mpiexec -np 64 --rankfile RANKFILE ./mpihello
(I don't have such machines, so I gave all the same core to get only the list
of locations [slots=0] which seems working)
One additional thought: OpenMPI fills the machines according to the given
machinefile. Maybe you don't need to provide a rankfile at all when the
machinefile has already be rearranged.
OK thanks, I'll try that Monday or when the kids are sleeping. Even if I don't
need it, it's helpful to see the script too.
Thanks so much for your very helpful (and quick) replies Reuti!
One additional note I forgot to mention: using hostgroups or a pattern, you
could also shorten the list of machines:
'... -q foobar@compute-0-[40152]', '... -q "*@*0-[40152]"' or even '... -q
"*@*[40152]"'
depending on the names of your queues/machines (see man `sge_types` section
"pattern").
-- Reuti
Noah
-- Reuti
-- Reuti
I hope I'm asking this in the right place-- sorry if not.
Thanks for any help!
Noah
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users