Ok that makes more sense.  The queue instance on node045 is called
conmat not test.   If test only exists as a single slot on each of
node046 and node047
then when you request -q test you are restricting it to those two
slots which isn't enough for a 4 slot job.
We would really need the full output of qstat -f to be sure though.


William
On 15 May 2012 14:42, Arturo <[email protected]> wrote:
> More info:
>
> output of qstat -f
>
> ---------------------------------------------------------------------------------
> [email protected]      BIP   0/0/64         0.00     lx26-amd64
> ---------------------------------------------------------------------------------
>
> ############################################################################
>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
> ############################################################################
>   74550 0.60500 test     arturo       qw    05/15/2012 15:26:50     4
>
> qconf -sq test |grep slot
>
>     slots                 64
>
>
> qconf -sp openmpi |grep slots
>
> slots              99999
> urgency_slots      min
>
> Regards
>
> El 15/05/12 15:39, Arturo escribió:
>
> Hi William,
>
> you were right, it was running in various nodos:
>
>   74545 0.60500 test     arturo       r     05/15/2012 15:17:46
> [email protected]      MASTER
>
>                                 [email protected]      SLAVE
>   74545 0.60500 test     arturo       r     05/15/2012 15:17:46
> [email protected]        SLAVE
>   74545 0.60500 test     arturo       r     05/15/2012 15:17:46
> [email protected]        SLAVE
>
> Well, looking deeply, the problem is that I created a complex value
> "slotsfree" consumable and requestable and I assigned it to the node045 with
> the value:
> slotsfree=8 (for example).
>
> If I submit a job using a parallel environment to this node without
> configuring this complex_value, it works perfectly.
> And when I submit a job without using a PE to this node, but with this
> complex_value configured, it also works,
> but when I submit the same job, using a PE  and the complex_value, it doen't
> work, and in the output it only says this:
>
> cannot run in PE "openmpi" because it only offers 2 slots
>
>
> Is it more clear now? Why does not work if I PE is configured without slot
> imitation, the node has 64 slots, and the slotsfree value is greated than 4?
>
> Thanks for your help.
>
> Regards
> Arturo
>
>
> El 15/05/12 14:33, William Hay escribió:
>
> On 15 May 2012 13:05, Arturo<[email protected]>  wrote:
>
> Hi,
>
> I have a very strange behaviour when I try to use a parallel environment
> with hard_queue_list option.
>
> In my script I have a parallel configuration:
>
>      #$ -pe openmpi 4
>
> and if submit the script in the following way it works and runs in node
> test@node045
>
>      qsub script.sh
>
> But If I submit the script using the hard_queue_list it doesn't run:
>
>      qsub -q test script.sh
>
> With this error:
>
>      cannot run in PE "openmpi" because it only offers 2 slots
>
> Obviously, the node is always empty. What may be wrong?
>
> It's hard to diagnose what's going on without knowing more about your
> configuration.
> Are you certain the entire job is running in the queue instance
> test@node045 when you submit without a queue list?
> One possibility is that queue test@node045 has only two slots.  The
> master slot of the job plus one slave runs
> in test@node045 while the remaining slots run elsewhere.
>
> When the job is running what output do you get from qstat -g t?
>
> William
>
>
>
>
>
> --
> Arturo Giner Gracia
> HPC research group System Administrator
> Instituto de Biocomputación y Física de Sistemas Complejos (BIFI)
> Universidad de Zaragoza
> e-mail: [email protected]
> phone: (+34) 976762992

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to