On 16 May 2012 08:44, Arturo <[email protected]> wrote: > What I want to do is that a user could request 128 slots in just 2 nodes > of 64 cores. Using the parallel environment > $ qsub -q conmat -pe foobar 5 submit.sh > makes the script to be executed in 5 slots, but they could be assigned > to 5 different nodes. > > If I try to use the built in complex_value slots: > $ qsub -q conmat -l slosts 5 submit.sh > > gives an error so I can't use it: > > Unable to run job: "job" denied: use parallel environments instead of > requesting slots explicitly. > Exiting. > > So, I have created a new complex_value, slotsfree similar to slots, but > I can use it as a requestable and consumable. > > But now, thanks to William, we have observed that slotsfree consmuption > is multiplied by the slots configured in the PE. > > Do you understant what my problem is?
It would appear you don't have enough "slotsfree". If I understand you correctly you're trying to do something like PBS/Torque's nodes:ppn request There was a post to the list not long ago from someone offering a config for emulating PBS/Torque like behavior in general. I don't think the method you are using will work as requesting a normal consumable can't force a job on to fewer nodes only on to more. However assuming you want to do this without emulating all of PBS then a more grid engine way would be to setup a bunch of PE's with different allocation_rule values one for each possible value of ppn. With 64 slots this could be quite a lot of PEs but you could script their creation. You might also want to request an exclusive resource when doing this as normally you won't want to share the node when doing something like ppn. William > > Ideally I would like to specify how many slots to use (128 for example) > and in how many different nodes, but without specifying explicitly in > which nodes. > > Many thanks for your help!!! > > El 15/05/12 16:32, Reuti escribió: >> Am 15.05.2012 um 16:23 schrieb Arturo: >> >>> <snip> >>> >>> >>> It doesn't matter to which queue I submit the script. >>> >>> I would use the built in slot complex, but when I use it gives me this >>> error: >>> >>> qsub -q conmat -l slots=5 submit.sh >>> Unable to run job: "job" denied: use parallel environments instead of >>> requesting slots explicitly. >>> Exiting. >> As the message says: you are not requesting a PE with the proper slot count? >> >> $ qsub -q conmat -pe foobar 5 submit.sh >> >> -- Reuti >> >> >>> Regards >>> >>> El 15/05/12 16:12, William Hay escribió: >>>> Ok that makes more sense. The queue instance on node045 is called >>>> conmat not test. If test only exists as a single slot on each of >>>> node046 and node047 >>>> then when you request -q test you are restricting it to those two >>>> slots which isn't enough for a 4 slot job. >>>> We would really need the full output of qstat -f to be sure though. >>>> >>>> >>>> William >>>> On 15 May 2012 14:42, Arturo >>>> <[email protected]> >>>> wrote: >>>> >>>>> More info: >>>>> >>>>> output of qstat -f >>>>> >>>>> --------------------------------------------------------------------------------- >>>>> >>>>> [email protected] >>>>> BIP 0/0/64 0.00 lx26-amd64 >>>>> --------------------------------------------------------------------------------- >>>>> >>>>> ############################################################################ >>>>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING >>>>> JOBS >>>>> ############################################################################ >>>>> 74550 0.60500 test arturo qw 05/15/2012 15:26:50 4 >>>>> >>>>> qconf -sq test |grep slot >>>>> >>>>> slots 64 >>>>> >>>>> >>>>> qconf -sp openmpi |grep slots >>>>> >>>>> slots 99999 >>>>> urgency_slots min >>>>> >>>>> Regards >>>>> >>>>> El 15/05/12 15:39, Arturo escribió: >>>>> >>>>> Hi William, >>>>> >>>>> you were right, it was running in various nodos: >>>>> >>>>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>>>> >>>>> [email protected] >>>>> MASTER >>>>> >>>>> >>>>> [email protected] >>>>> SLAVE >>>>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>>>> >>>>> [email protected] >>>>> SLAVE >>>>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>>>> >>>>> [email protected] >>>>> SLAVE >>>>> >>>>> Well, looking deeply, the problem is that I created a complex value >>>>> "slotsfree" consumable and requestable and I assigned it to the node045 >>>>> with >>>>> the value: >>>>> slotsfree=8 (for example). >>>>> >>>>> If I submit a job using a parallel environment to this node without >>>>> configuring this complex_value, it works perfectly. >>>>> And when I submit a job without using a PE to this node, but with this >>>>> complex_value configured, it also works, >>>>> but when I submit the same job, using a PE and the complex_value, it >>>>> doen't >>>>> work, and in the output it only says this: >>>>> >>>>> cannot run in PE "openmpi" because it only offers 2 slots >>>>> >>>>> >>>>> Is it more clear now? Why does not work if I PE is configured without slot >>>>> imitation, the node has 64 slots, and the slotsfree value is greated than >>>>> 4? >>>>> >>>>> Thanks for your help. >>>>> >>>>> Regards >>>>> Arturo >>>>> >>>>> >>>>> El 15/05/12 14:33, William Hay escribió: >>>>> >>>>> On 15 May 2012 13:05, Arturo >>>>> <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have a very strange behaviour when I try to use a parallel environment >>>>> with hard_queue_list option. >>>>> >>>>> In my script I have a parallel configuration: >>>>> >>>>> #$ -pe openmpi 4 >>>>> >>>>> and if submit the script in the following way it works and runs in node >>>>> test@node045 >>>>> >>>>> qsub script.sh >>>>> >>>>> But If I submit the script using the hard_queue_list it doesn't run: >>>>> >>>>> qsub -q test script.sh >>>>> >>>>> With this error: >>>>> >>>>> cannot run in PE "openmpi" because it only offers 2 slots >>>>> >>>>> Obviously, the node is always empty. What may be wrong? >>>>> >>>>> It's hard to diagnose what's going on without knowing more about your >>>>> configuration. >>>>> Are you certain the entire job is running in the queue instance >>>>> test@node045 when you submit without a queue list? >>>>> One possibility is that queue test@node045 has only two slots. The >>>>> master slot of the job plus one slave runs >>>>> in test@node045 while the remaining slots run elsewhere. >>>>> >>>>> When the job is running what output do you get from qstat -g t? >>>>> >>>>> William >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
