Am 23.08.2014 um 16:46 schrieb Noah Knowles: > Hi Reuti, > > On 08/23/2014 01:38 AM, Reuti wrote: >> Am 23.08.2014 um 02:37 schrieb Reuti: >> >>> Hi, >>> >>> Am 23.08.2014 um 00:43 schrieb Noah Knowles: >>> >>>> Hi, I am using OGS/GE 2011.11p1 on ROCKS. We have a small cluster with a >>>> combination of 12- and 16-core blades. We are running an application where >>>> the specific assignment of ranks to nodes has a big effect on run time. Is >>>> it possible, for example, with NP=64 to specify that >>>> >>>> ranks 0-15 go to a 16-core blade, >>>> ranks 16-27 go to a 12-core blade, >>>> ranks 28-39 go to a 12-core blade, >>>> ranks 40-55 go to a 16-core blade, and >>>> ranks 56-63 go to a 12-core blade? >>>> >>>> I tried, for this example, >>>> qsub -binding linear:64 -l >>>> h="compute-0-4|compute-0-0|compute-0-1|compute-0-5|compute-0-2" >>> The binding would only be honored (as it's a soft request), if there would >>> be a node with 64 cores. And it must also be activated in "execd_params" in >>> SGE's configuration. > OK I see. I misunderstood the way that binding works. >>> >>> >>>> (where compute nodes 4-5 are 16 core and the others are 12-core), but that >>>> gave me no control over the order in which the nodes were assigned. >>>> >>>> We are experimenting with Intel MPI and OpenMPI-- I couldn't figure out >>>> how to do this with the Intel mpirun options, and rankfiles were causing >>>> errors, so I was hoping to accomplish it with qsub. >>> - Do you have a tight integration of Open MPI into SGE (i.e. compiled with >>> "--with-sge")? > yes >>> - All 64 are MPI processes, no OpenMP threads? > correct >>> - What PE did you use? > orte >>> - You always want complete machines, i.e. you could also request 68 cores? > yes that would be smarter! >>> - The rank0 (i.e. where also the jobscript runs) can be selected with: >>> >>> `qsub -masterq foobar@compute-0-4 ...` >>> >>> - Additional machines with: >>> >>> "... -q >>> foobar@compute-0-4,foobar@compute-0-0,foobar@compute-0-1,foobar@compute-0-5,foobar@compute-0-2" >>> >>> (foobar@compute-0-4 needs to be listed in both options, no order of hosts >>> guaranteed) >>> >>> Creating a rankfile out of the granted machinefile should work (i.e. >>> keeping the allocation). As long as you are alone on these machine, it's >>> better when Open MPI would do the binding to cores finally. >>> >>> Jobscript: >>> >>> # Reorder in the way you need them >>> sort $PE_HOSTFILE > RESORTED_HOSTFILE >>> export PE_HOSTFILE=RESORTED_HOSTFILE >>> >>> PeHostfile2RankFile() >>> { >>> rank=0 >>> cat RESORTED_HOSTFILE | while read line; do >>> # echo $line >>> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."` >>> nslots=`echo $line|cut -f2 -d" "` >>> i=0 >>> while [ $i -lt $nslots ]; do >>> echo "rank $rank=$host slot=$i" >>> rank=`expr $rank + 1` >>> i=`expr $i + 1` >>> if [ $rank -eq "$1" ]; then >>> break >>> fi >>> done >>> done >>> } >>> >>> PeHostfile2RankFile 64 > RANKFILE >>> >>> mpiexec -np 64 --rankfile RANKFILE ./mpihello >>> >>> (I don't have such machines, so I gave all the same core to get only the >>> list of locations [slots=0] which seems working) >> One additional thought: OpenMPI fills the machines according to the given >> machinefile. Maybe you don't need to provide a rankfile at all when the >> machinefile has already be rearranged. > OK thanks, I'll try that Monday or when the kids are sleeping. Even if I > don't need it, it's helpful to see the script too. > Thanks so much for your very helpful (and quick) replies Reuti!
One additional note I forgot to mention: using hostgroups or a pattern, you could also shorten the list of machines: '... -q foobar@compute-0-[40152]', '... -q "*@*0-[40152]"' or even '... -q "*@*[40152]"' depending on the names of your queues/machines (see man `sge_types` section "pattern"). -- Reuti > Noah >> >> -- Reuti >> >> >>> -- Reuti >>> >>> >>>> I hope I'm asking this in the right place-- sorry if not. >>>> Thanks for any help! >>>> Noah >>>> _______________________________________________ >>>> users mailing list >>>> users@gridengine.org >>>> https://gridengine.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> users mailing list >>> users@gridengine.org >>> https://gridengine.org/mailman/listinfo/users >> > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users