Hello Michael,

##
In regard of 'int_test' PE you created. If you set allocation rule to
integer, it would mean that the job _must_ request amount of slots equal or
multiple to this value.
In your case, PE is defined to use '8' as allocation rule, so your job must
request 8 or 16 or 24 ... slots. In case of you request 2, the job will
never start, as the scheduler can't allocate 2 slots with allocation rule
set to 8.

>From man sge_pe:
"If  the  number  of  tasks  specified with the "-pe" option (see qsub(1))
does not  divide  without  remainder  by this  <int>  the  job  will not be
scheduled. "

So, the fact that the job in int_test never starts if it requests 2 cores -
is totally fine from the scheduler point of view.

##
In regard of this issue in general: just wondering if you, or users on the
cluster use '-R y' ( reservation ) option for theirs jobs? I have seen such
a behavior, when someone submits a job with a reservation defined. The
scheduler reserves slots on the cluster for this big job, and doesn't let
new jobs come ( especially in case of runtime is not defined by h_rt ). In
this case, there will be no messages in the scheduler log which is
confusing some time.

Best regards,
Mikhail Serkov

On Fri, Aug 11, 2017 at 6:41 PM, Michael Stauffer <mgsta...@gmail.com>
wrote:

> Hi,
>
>
> Below I've dumped relevant configurations.
>
> Today I created a new PE called "int_test" to test the "integer"
> allocation rule. I set it to 16 (16 cores per node), and have also tried 8.
> It's been added as a PE to the queues we use. When I try to run to this new
> PE however, it *always* fails with the same "PE ...offers 0 slots" error,
> even if I can run the same multi-slot job using "unihost" PE at the same
> time. I'm not sure if this helps debug or not.
>
> Another thought - this behavior started happening some time ago more or
> less when I tried implementing fairshare behavior. I never seemed to get
> fairshare working right. We haven't been able to confirm, but for some
> users it seems this "PE 0 slots" issue pops up only after they've been
> running other jobs for a little while. So I'm wondering if I've screwed up
> fairshare in some way that's causing this odd behavior.
>
>
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to