Thanks Reuti, we'll try the fixed allocation rules. We will have to create one or two more parallel environments for the different machine types, but I hope it won't disturb our users much.
Rafa El jue, 18-09-2014 a las 15:46 +0200, Reuti escribió: > Am 18.09.2014 um 14:52 schrieb Rafael Arco Arredondo: > > > The hosts are free before running the jobs and are all identical in > > terms of available resources. > > > > Looking a bit deeper into the problem, it seems that sometimes jobs > > requesting only 8 slots are executed on the nodes, overriding the check > > in the prolog (which says the number of slots has to be multiple of 16). > > In the prolog? At this point the exechost selection was already done. Do you > reschedule the job then? > > You can define fixed allocation rules for each type of machine, i.e. like > "allocation_rule 8". Then a job requesting a multiple of 8 can select the PE > by a wildcard. > > qsub -pe baz 32 > > will get 4 times 8 cores for sure > > === > > To make it a little bit flexible: > > PE: foo_2 > allocation_rule 2 > > PE: foo_24 > allocation_rule 4 > > PE: foo_248 > allocation_rule 8 > > PE: foo_24816 > allocation_rule 16 > > Submission command: > > qsub -pe foo* 16 > > => gets 16 slots from any of the PEs (but only from one PE once it's > selected), it could be 8 times 2 slots, or 4 times 4 slots > > If you don't want too many nodes: > > qsub -pe foo_248* 16 > > I want at least 8 per machine, i.e. 2 times 8 cores or one time 16 cores > > > -- Reuti > > > > It's like the prolog sometimes wasn't executed... > > > > El jue, 18-09-2014 a las 14:01 +0200, Winkler, Ursula > > (ursula.wink...@uni-graz.at) escribió: > >> Hi Rafa, > >> > >> Are the jobs scheduled on hosts where already other jobs are running (so > >> that only 8 slots are used on some hosts)? Or are all hosts free? Have all > >> nodes the same resources (i.e. slots, memory,...?) configured? > >> > >> C., U. > >> > >> -----Ursprüngliche Nachricht----- > >> Von: Rafael Arco Arredondo [mailto:rafaa...@ugr.es] > >> Gesendet: Donnerstag, 18. September 2014 12:23 > >> An: Winkler, Ursula (ursula.wink...@uni-graz.at) > >> Cc: users@gridengine.org > >> Betreff: Re: AW: [gridengine users] Hosts not fully used with fill up > >> > >> Thanks Ursula for your reply, but we need a parallel environment for more > >> than one node (we are using it with MPI). Sorry if I wasn't clear enough. > >> > >> I give you an example. We have hosts with 16 slots. Now the user requests > >> 32 slots for a job, but instead of allocating slots on two nodes, > >> sometimes four nodes are used, each with 8 slots. We don't know why this > >> is happening, but it didn't happen with 6.2 with a similar configuration. > >> > >> Cheers, > >> > >> Rafa > >> > >> El jue, 18-09-2014 a las 12:11 +0200, Winkler, Ursula > >> (ursula.wink...@uni-graz.at) escribió: > >>> Hi Rafa, > >>> > >>> for such purposes we have configured a separate Parallel Environment with > >>> "allocation_rule" "$pe_slots" (instead of "$fill_up"). Jobs scheduled > >>> with this rule can run ONLY on one host. > >>> > >>> Regards, > >>> Usula > >>> > >>> -----Ursprüngliche Nachricht----- > >>> Von: users-boun...@gridengine.org > >>> [mailto:users-boun...@gridengine.org] Im Auftrag von Rafael Arco > >>> Arredondo > >>> Gesendet: Donnerstag, 18. September 2014 09:44 > >>> An: users@gridengine.org > >>> Betreff: [gridengine users] Hosts not fully used with fill up > >>> > >>> Hello everyone, > >>> > >>> We are having an issue with the parallel environments and the allocation > >>> of slots with the fill up policy. > >>> > >>> Although we have configured the resource quotas of the queues not to use > >>> more than the number of slots the machine have and we control in the > >>> prolog that the jobs be submitted with a number of slots multiple of the > >>> number of physical processors, we are observing that sometimes, the slots > >>> of a job are split into several nodes, when they should be running in > >>> only one node. > >>> > >>> We are using Open Grid Scheduler 2011.11p1. This didn't happen in SGE 6.2. > >>> > >>> Has anyone experienced the same situation? Any clues of why it is > >>> happening? > >>> > >>> Thanks in advance, > >>> > >>> Rafa > >>> > >>> _______________________________________________ > >>> users mailing list > >>> users@gridengine.org > >>> https://gridengine.org/mailman/listinfo/users > > > > > > > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > > > -- Rafael Arco Arredondo Centro de Servicios de Informática y Redes de Comunicaciones Campus de Fuentenueva - Edificio Mecenas Universidad de Granada E-18071 Granada Spain Tel: +34 958 241440 Ext:41440 E-mail: rafaa...@ugr.es =============== "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it". ================ _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users