Am 15.05.2012 um 16:23 schrieb Arturo: > <snip> > > > It doesn't matter to which queue I submit the script. > > I would use the built in slot complex, but when I use it gives me this error: > > qsub -q conmat -l slots=5 submit.sh > Unable to run job: "job" denied: use parallel environments instead of > requesting slots explicitly. > Exiting.
As the message says: you are not requesting a PE with the proper slot count? $ qsub -q conmat -pe foobar 5 submit.sh -- Reuti > Regards > > El 15/05/12 16:12, William Hay escribió: >> Ok that makes more sense. The queue instance on node045 is called >> conmat not test. If test only exists as a single slot on each of >> node046 and node047 >> then when you request -q test you are restricting it to those two >> slots which isn't enough for a 4 slot job. >> We would really need the full output of qstat -f to be sure though. >> >> >> William >> On 15 May 2012 14:42, Arturo >> <[email protected]> >> wrote: >> >>> More info: >>> >>> output of qstat -f >>> >>> --------------------------------------------------------------------------------- >>> >>> [email protected] >>> BIP 0/0/64 0.00 lx26-amd64 >>> --------------------------------------------------------------------------------- >>> >>> ############################################################################ >>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS >>> ############################################################################ >>> 74550 0.60500 test arturo qw 05/15/2012 15:26:50 4 >>> >>> qconf -sq test |grep slot >>> >>> slots 64 >>> >>> >>> qconf -sp openmpi |grep slots >>> >>> slots 99999 >>> urgency_slots min >>> >>> Regards >>> >>> El 15/05/12 15:39, Arturo escribió: >>> >>> Hi William, >>> >>> you were right, it was running in various nodos: >>> >>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>> >>> [email protected] >>> MASTER >>> >>> >>> [email protected] >>> SLAVE >>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>> >>> [email protected] >>> SLAVE >>> 74545 0.60500 test arturo r 05/15/2012 15:17:46 >>> >>> [email protected] >>> SLAVE >>> >>> Well, looking deeply, the problem is that I created a complex value >>> "slotsfree" consumable and requestable and I assigned it to the node045 with >>> the value: >>> slotsfree=8 (for example). >>> >>> If I submit a job using a parallel environment to this node without >>> configuring this complex_value, it works perfectly. >>> And when I submit a job without using a PE to this node, but with this >>> complex_value configured, it also works, >>> but when I submit the same job, using a PE and the complex_value, it doen't >>> work, and in the output it only says this: >>> >>> cannot run in PE "openmpi" because it only offers 2 slots >>> >>> >>> Is it more clear now? Why does not work if I PE is configured without slot >>> imitation, the node has 64 slots, and the slotsfree value is greated than 4? >>> >>> Thanks for your help. >>> >>> Regards >>> Arturo >>> >>> >>> El 15/05/12 14:33, William Hay escribió: >>> >>> On 15 May 2012 13:05, Arturo >>> <[email protected]> >>> wrote: >>> >>> Hi, >>> >>> I have a very strange behaviour when I try to use a parallel environment >>> with hard_queue_list option. >>> >>> In my script I have a parallel configuration: >>> >>> #$ -pe openmpi 4 >>> >>> and if submit the script in the following way it works and runs in node >>> test@node045 >>> >>> qsub script.sh >>> >>> But If I submit the script using the hard_queue_list it doesn't run: >>> >>> qsub -q test script.sh >>> >>> With this error: >>> >>> cannot run in PE "openmpi" because it only offers 2 slots >>> >>> Obviously, the node is always empty. What may be wrong? >>> >>> It's hard to diagnose what's going on without knowing more about your >>> configuration. >>> Are you certain the entire job is running in the queue instance >>> test@node045 when you submit without a queue list? >>> One possibility is that queue test@node045 has only two slots. The >>> master slot of the job plus one slave runs >>> in test@node045 while the remaining slots run elsewhere. >>> >>> When the job is running what output do you get from qstat -g t? >>> >>> William >>> >>> >>> >>> >>> >>> -- >>> Arturo Giner Gracia >>> HPC research group System Administrator >>> Instituto de Biocomputación y Física de Sistemas Complejos (BIFI) >>> Universidad de Zaragoza >>> e-mail: >>> [email protected] >>> >>> phone: (+34) 976762992 >>> > > > -- > Arturo Giner Gracia > HPC research group System Administrator > Instituto de Biocomputación y Física de Sistemas Complejos (BIFI) > Universidad de Zaragoza > e-mail: > [email protected] > > phone: (+34) 976762992 > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
