Hi, > Am 09.03.2015 um 14:02 schrieb Rafael Arco Arredondo <rafaa...@ugr.es>: > > Hello again, > > Now we have a different situation. I'll try to explain it as best as I > can. We have two different sets of nodes, A and B, A with 16 cores per > host and B with 32. We want to put all the nodes in only one cluster > queue, which has configured two parallel environments as well, pe16 and > pe32, pe16 with an allocation_rule of 16 and pe32 with 32. The queue has > the following definition for the number of slots:
Before going into detail: did you attach both PEs to all machines, or was it split too like the slots: pe_list pe16,[@B=pe32] ? -- Reuti > slots 16,[@B=32] > > After submitting several jobs with 32 slots to the queue and pe16 as > parallel environment, we observed that the first job starts running on > two of the nodes -not only one- of the B set, when it could run in just > one node. I understand this is because GE allocates the first 16 slots > as a block, and then it uses the least loaded machine for the next 16 > (which, of course, is not the first machine). Then, the second job used > 2 nodes of the A set, which is the behavior that we want. > > Of course, the easiest solution is two create two queues, but we don't > want to do that. Is there any way to avoid the situation that happened > with the first job? Ideally, we would like to be able to choose the set > of nodes depending on the parallel environment (that is, if the user > indicates pe16 then the job is run on set A, and if the user indicates > pe32 the job is run on set B). I imagine the orthodox way of doing this > is with two different cluster queues, but I want to make sure it's not > possible with only one queue. > > We would also like to have more information about how GE selects the > nodes in this type of situations, i.e. when the allocation rule is a > fixed number and not fill_up or round_robin. Does it select the least > loaded node? Or is it the most loaded one? Is there any detailed > documentation about this? And besides, why does the fill_up rule > sometimes leave fragmented nodes (that is, when -pe pe16 16 is used and > 2 nodes are selected, each with 8 slots), even when there are no jobs > using less than 16 slots? Perhaps because a user submits a job with, > let's say, 8 slots and it is aborted with error state (Eqw) after > failing in the prolog (where we control the number of slots be multiple > of 16), and then the selected slots are kept reserved as if they are > effectively used? > > I hope everything is clear enough. Thanks again for your help and best > regards, > > Rafa > > El lun, 22-09-2014 a las 12:13 +0200, Rafael Arco Arredondo escribió: >> Thanks Reuti, we'll try the fixed allocation rules. We will have to >> create one or two more parallel environments for the different machine >> types, but I hope it won't disturb our users much. >> >> Rafa >> >> El jue, 18-09-2014 a las 15:46 +0200, Reuti escribió: >>> Am 18.09.2014 um 14:52 schrieb Rafael Arco Arredondo: >>> >>>> The hosts are free before running the jobs and are all identical in >>>> terms of available resources. >>>> >>>> Looking a bit deeper into the problem, it seems that sometimes jobs >>>> requesting only 8 slots are executed on the nodes, overriding the check >>>> in the prolog (which says the number of slots has to be multiple of 16). >>> >>> In the prolog? At this point the exechost selection was already done. Do >>> you reschedule the job then? >>> >>> You can define fixed allocation rules for each type of machine, i.e. like >>> "allocation_rule 8". Then a job requesting a multiple of 8 can select the >>> PE by a wildcard. >>> >>> qsub -pe baz 32 >>> >>> will get 4 times 8 cores for sure >>> >>> === >>> >>> To make it a little bit flexible: >>> >>> PE: foo_2 >>> allocation_rule 2 >>> >>> PE: foo_24 >>> allocation_rule 4 >>> >>> PE: foo_248 >>> allocation_rule 8 >>> >>> PE: foo_24816 >>> allocation_rule 16 >>> >>> Submission command: >>> >>> qsub -pe foo* 16 >>> >>> => gets 16 slots from any of the PEs (but only from one PE once it's >>> selected), it could be 8 times 2 slots, or 4 times 4 slots >>> >>> If you don't want too many nodes: >>> >>> qsub -pe foo_248* 16 >>> >>> I want at least 8 per machine, i.e. 2 times 8 cores or one time 16 cores >>> >>> >>> -- Reuti >>> >>> >>>> It's like the prolog sometimes wasn't executed... >>>> >>>> El jue, 18-09-2014 a las 14:01 +0200, Winkler, Ursula >>>> (ursula.wink...@uni-graz.at) escribió: >>>>> Hi Rafa, >>>>> >>>>> Are the jobs scheduled on hosts where already other jobs are running (so >>>>> that only 8 slots are used on some hosts)? Or are all hosts free? Have >>>>> all nodes the same resources (i.e. slots, memory,...?) configured? >>>>> >>>>> C., U. >>>>> >>>>> -----Ursprüngliche Nachricht----- >>>>> Von: Rafael Arco Arredondo [mailto:rafaa...@ugr.es] >>>>> Gesendet: Donnerstag, 18. September 2014 12:23 >>>>> An: Winkler, Ursula (ursula.wink...@uni-graz.at) >>>>> Cc: users@gridengine.org >>>>> Betreff: Re: AW: [gridengine users] Hosts not fully used with fill up >>>>> >>>>> Thanks Ursula for your reply, but we need a parallel environment for more >>>>> than one node (we are using it with MPI). Sorry if I wasn't clear enough. >>>>> >>>>> I give you an example. We have hosts with 16 slots. Now the user requests >>>>> 32 slots for a job, but instead of allocating slots on two nodes, >>>>> sometimes four nodes are used, each with 8 slots. We don't know why this >>>>> is happening, but it didn't happen with 6.2 with a similar configuration. >>>>> >>>>> Cheers, >>>>> >>>>> Rafa >>>>> >>>>> El jue, 18-09-2014 a las 12:11 +0200, Winkler, Ursula >>>>> (ursula.wink...@uni-graz.at) escribió: >>>>>> Hi Rafa, >>>>>> >>>>>> for such purposes we have configured a separate Parallel Environment >>>>>> with "allocation_rule" "$pe_slots" (instead of "$fill_up"). Jobs >>>>>> scheduled with this rule can run ONLY on one host. >>>>>> >>>>>> Regards, >>>>>> Usula >>>>>> >>>>>> -----Ursprüngliche Nachricht----- >>>>>> Von: users-boun...@gridengine.org >>>>>> [mailto:users-boun...@gridengine.org] Im Auftrag von Rafael Arco >>>>>> Arredondo >>>>>> Gesendet: Donnerstag, 18. September 2014 09:44 >>>>>> An: users@gridengine.org >>>>>> Betreff: [gridengine users] Hosts not fully used with fill up >>>>>> >>>>>> Hello everyone, >>>>>> >>>>>> We are having an issue with the parallel environments and the allocation >>>>>> of slots with the fill up policy. >>>>>> >>>>>> Although we have configured the resource quotas of the queues not to use >>>>>> more than the number of slots the machine have and we control in the >>>>>> prolog that the jobs be submitted with a number of slots multiple of the >>>>>> number of physical processors, we are observing that sometimes, the >>>>>> slots of a job are split into several nodes, when they should be running >>>>>> in only one node. >>>>>> >>>>>> We are using Open Grid Scheduler 2011.11p1. This didn't happen in SGE >>>>>> 6.2. >>>>>> >>>>>> Has anyone experienced the same situation? Any clues of why it is >>>>>> happening? >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Rafa >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@gridengine.org >>>>>> https://gridengine.org/mailman/listinfo/users >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@gridengine.org >>>> https://gridengine.org/mailman/listinfo/users >>>> >>> >> > > > -- > Rafael Arco Arredondo > Centro de Servicios de Informática y Redes de Comunicaciones > Campus de Fuentenueva - Edificio Mecenas > Universidad de Granada > E-18071 Granada Spain > Tel: +34 958 241440 Ext:41440 E-mail: rafaa...@ugr.es > =============== > "Este mensaje se dirige exclusivamente a su destinatario y puede > contener información privilegiada o confidencial. Si no es Ud. el > destinatario indicado, queda notificado de que la utilización, > divulgación o copia sin autorización está prohibida en virtud de la > legislación vigente. Si ha recibido este mensaje por error, se ruega lo > comunique inmediatamente por esta misma vía y proceda a su destrucción. > > This message is intended exclusively for its addressee and may contain > information that is CONFIDENTIAL and protected by professional > privilege. If you are not the intended recipient you are hereby notified > that any dissemination, copy or disclosure of this communication is > strictly prohibited by law. If this message has been received in error, > please immediately notify us via e-mail and delete it". > ================ > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users