Re: [SGE-discuss] scheduling info: cannot run in PE "mpi" because it only offers 2147483648 slots

Razvan Sultana Mon, 13 Jun 2016 08:21:45 -0700

I have actually found the source of the PE slots problem!

It was this complex value that I have added to manage limiting the totalnumber of jobs of a certain type that can run simultaneously on the cluster:

connections       conn      INT       <=    YES YES        NONE     0

The default value was set to 'NONE' - which is probably represented as2147483648 :)This was wrong, because this is an INT complex, so when I changed thedefault to 0, e.g.:

connections       conn      INT       <=    YES YES        0     0

I stopped having the PE scheduling problem!

I thought this might be useful to other people that might make the samemistake!

Regards,
Razvan

On 10/06/16 17:15, Razvan Sultana wrote:

But looking at the discussion here:
https://arc.liv.ac.uk/trac/SGE/ticket/1429
I saw that you were referencing this ticket:
https://arc.liv.ac.uk/trac/SGE/ticket/793
where there is the same message that I've seen and you mention thatEXCL might be to blame?
I have actually added this entry to the complex values:
exclusive            excl        BOOL      EXCL    YES YES 0        1000

I tried taking it out but I still see the same errors :(

Razvan

On 10/06/16 16:50, Razvan Sultana wrote:
Hi William,
I haven't touched the h_rt and s_rt values - they are by defaultINFINITY:
qconf -sq all.q | grep '_rt'
s_rt                  INFINITY
h_rt                  INFINITY

Razvan

On 10/06/16 15:49, William Hay wrote:
On Fri, Jun 10, 2016 at 03:07:53PM +0100, Razvan Sultana wrote:
the job just sits there in a 'qw' state, with this scheduling infoshowing:scheduling info: cannot run in PE "mpi" because it onlyoffers
2147483648 slots
I have tried anything I could think of - changing the number ofslots in th
PE queue, changing the allocation rule, etc.
Nothing changed - all the jobs with `-pe mpi` fail to be scheduled.

This looks like a bug to me.
2147483648 is 0x80000000 and it's -2147483648 when seen as a signedint, so
5 > -2147483648
But of course, the number of available slots to the PE should beanything
but this number (I tried 9999, 99, 10  - no change).
I tried looking in this (and other precursor) discussion listarchives forsimilar error messages and although it pops up from time to time,nobody
seems to know why that is or how to fix it.
Any suggestions to fix this issue?
Does your cluster have a particularly short default runtime:
https://arc.liv.ac.uk/trac/SGE/ticket/1429

William
_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss


_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] scheduling info: cannot run in PE "mpi" because it only offers 2147483648 slots

Reply via email to