More info:

output of qstat -f

---------------------------------------------------------------------------------
[email protected]      BIP   0/0/64         0.00     lx26-amd64
---------------------------------------------------------------------------------

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
  74550 0.60500 test     arturo       qw    05/15/2012 15:26:50     4

qconf -sq test |grep slot

    slots                 64

qconf -sp openmpi |grep slots

slots              99999
urgency_slots      min

Regards

El 15/05/12 15:39, Arturo escribió:
Hi William,

you were right, it was running in various nodos:

74545 0.60500 test arturo r 05/15/2012 15:17:46 [email protected] MASTER [email protected] SLAVE 74545 0.60500 test arturo r 05/15/2012 15:17:46 [email protected] SLAVE 74545 0.60500 test arturo r 05/15/2012 15:17:46 [email protected] SLAVE

Well, looking deeply, the problem is that I created a complex value "slotsfree" consumable and requestable and I assigned it to the node045 with the value:
slotsfree=8 (for example).

If I submit a job using a parallel environment to this node without configuring this complex_value, it works perfectly. And when I submit a job without using a PE to this node, but with this complex_value configured, it also works, but when I submit the same job, using a PE and the complex_value, it doen't work, and in the output it only says this:

cannot run in PE "openmpi" because it only offers 2 slots


Is it more clear now? Why does not work if I PE is configured without slot imitation, the node has 64 slots, and the slotsfree value is greated than 4?

Thanks for your help.

Regards
Arturo


El 15/05/12 14:33, William Hay escribió:
On 15 May 2012 13:05, Arturo<[email protected]>  wrote:
Hi,

I have a very strange behaviour when I try to use a parallel environment
with hard_queue_list option.

In my script I have a parallel configuration:

     #$ -pe openmpi 4

and if submit the script in the following way it works and runs in node
test@node045

     qsub script.sh

But If I submit the script using the hard_queue_list it doesn't run:

     qsub -q test script.sh

With this error:

     cannot run in PE "openmpi" because it only offers 2 slots

Obviously, the node is always empty. What may be wrong?
It's hard to diagnose what's going on without knowing more about your
configuration.
Are you certain the entire job is running in the queue instance
test@node045 when you submit without a queue list?
One possibility is that queue test@node045 has only two slots.  The
master slot of the job plus one slave runs
in test@node045 while the remaining slots run elsewhere.

When the job is running what output do you get from qstat -g t?

William




--
Arturo Giner Gracia
HPC research group System Administrator
Instituto de Biocomputación y Física de Sistemas Complejos (BIFI)
Universidad de Zaragoza
e-mail: [email protected]
phone: (+34) 976762992

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to