Am 16.05.2012 um 09:44 schrieb Arturo: > What I want to do is that a user could request 128 slots in just 2 nodes of > 64 cores. Using the parallel environment > $ qsub -q conmat -pe foobar 5 submit.sh > makes the script to be executed in 5 slots, but they could be assigned to 5 > different nodes.
The selection of nodes depends on the allocation_rule in the PE setting. It's true that this selection criteria is to be set up by the admin, and not by the user like in Torque. But a second variable slots_free won't help - how should this variable be used to select any particular number of hosts? > If I try to use the built in complex_value slots: > $ qsub -q conmat -l slosts 5 submit.sh To get 5 slots on a single machine the allocation_rule can be set to $pe_slots. If you have 64 core machines and want to be sure to get 2 and only 2 machines for a request of 128 slots, the "exclusive" complex feature can be used if requested in addition to the allocation_rule $fill_up. Another option could be a fixed allocation_rule of 64. But then you are limited to multiple of 64 of course. -- Reuti > gives an error so I can't use it: > > Unable to run job: "job" denied: use parallel environments instead of > requesting slots explicitly. > Exiting. > > So, I have created a new complex_value, slotsfree similar to slots, but I can > use it as a requestable and consumable. > > But now, thanks to William, we have observed that slotsfree consmuption is > multiplied by the slots configured in the PE. > > Do you understant what my problem is? > > Ideally I would like to specify how many slots to use (128 for example) and > in how many different nodes, but without specifying explicitly in which nodes. > > Many thanks for your help!!! > > El 15/05/12 16:32, Reuti escribió: >> Am 15.05.2012 um 16:23 schrieb Arturo: >> >>> <snip> >>> >>> >>> It doesn't matter to which queue I submit the script. >>> >>> I would use the built in slot complex, but when I use it gives me this >>> error: >>> >>> qsub -q conmat -l slots=5 submit.sh >>> Unable to run job: "job" denied: use parallel environments instead of >>> requesting slots explicitly. >>> Exiting. >> As the message says: you are not requesting a PE with the proper slot count? >> >> $ qsub -q conmat -pe foobar 5 submit.sh >> >> -- Reuti >> >> >>> Regards >>> >>> El 15/05/12 16:12, William Hay escribió: >>>> Ok that makes more sense. The queue instance on node045 is called >>>> conmat not test. If test only exists as a single slot on each of >>>> node046 and node047 >>>> then when you request -q test you are restricting it to those two >>>> slots which isn't enough for a 4 slot job. >>>> We would really need the full output of qstat -f to be sure though. >>>> >>>> >>>> William >>>> On 15 May 2012 14:42, Arturo >>>> <[email protected]> >>>> wrote: >>>> >>>>> More info: >>>>> >>>>> output of qstat -f >>>>> >>>>> --------------------------------------------------------------------------------- >>>>> >>>>> [email protected] >>>>> BIP 0/0/64 0.00 lx26-amd64 >>>>> --------------------------------------------------------------------------------- >>>>> >>>>> ############################################################################ >>>>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING >>>>> JOBS >>>>> ############################################################################ >>>>> 74550 0.60500 test arturo qw 05/15/2012 15:26:50 4 >>>>> >>>>> qconf -sq test |grep slot >>>>> >>>>> slots 64 >>>>> >>>>> >>>>> qconf -sp openmpi |grep slots >>>>> >>>>> slots 99999 >>>>> urgency_slots min >>>>> >>>>> Regards >>>>> >>>>> El 15/05/12 15:39, Arturo escribió: >>>>> >>>>> Hi William, >>>>> >>>>> you were right, it was running in various nodos: >>>>> >>>>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>>>> >>>>> [email protected] >>>>> MASTER >>>>> >>>>> >>>>> [email protected] >>>>> SLAVE >>>>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>>>> >>>>> [email protected] >>>>> SLAVE >>>>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>>>> >>>>> [email protected] >>>>> SLAVE >>>>> >>>>> Well, looking deeply, the problem is that I created a complex value >>>>> "slotsfree" consumable and requestable and I assigned it to the node045 >>>>> with >>>>> the value: >>>>> slotsfree=8 (for example). >>>>> >>>>> If I submit a job using a parallel environment to this node without >>>>> configuring this complex_value, it works perfectly. >>>>> And when I submit a job without using a PE to this node, but with this >>>>> complex_value configured, it also works, >>>>> but when I submit the same job, using a PE and the complex_value, it >>>>> doen't >>>>> work, and in the output it only says this: >>>>> >>>>> cannot run in PE "openmpi" because it only offers 2 slots >>>>> >>>>> >>>>> Is it more clear now? Why does not work if I PE is configured without slot >>>>> imitation, the node has 64 slots, and the slotsfree value is greated than >>>>> 4? >>>>> >>>>> Thanks for your help. >>>>> >>>>> Regards >>>>> Arturo >>>>> >>>>> >>>>> El 15/05/12 14:33, William Hay escribió: >>>>> >>>>> On 15 May 2012 13:05, Arturo >>>>> <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have a very strange behaviour when I try to use a parallel environment >>>>> with hard_queue_list option. >>>>> >>>>> In my script I have a parallel configuration: >>>>> >>>>> #$ -pe openmpi 4 >>>>> >>>>> and if submit the script in the following way it works and runs in node >>>>> test@node045 >>>>> >>>>> qsub script.sh >>>>> >>>>> But If I submit the script using the hard_queue_list it doesn't run: >>>>> >>>>> qsub -q test script.sh >>>>> >>>>> With this error: >>>>> >>>>> cannot run in PE "openmpi" because it only offers 2 slots >>>>> >>>>> Obviously, the node is always empty. What may be wrong? >>>>> >>>>> It's hard to diagnose what's going on without knowing more about your >>>>> configuration. >>>>> Are you certain the entire job is running in the queue instance >>>>> test@node045 when you submit without a queue list? >>>>> One possibility is that queue test@node045 has only two slots. The >>>>> master slot of the job plus one slave runs >>>>> in test@node045 while the remaining slots run elsewhere. >>>>> >>>>> When the job is running what output do you get from qstat -g t? >>>>> >>>>> William >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
