Hello Michael, ## In regard of 'int_test' PE you created. If you set allocation rule to integer, it would mean that the job _must_ request amount of slots equal or multiple to this value. In your case, PE is defined to use '8' as allocation rule, so your job must request 8 or 16 or 24 ... slots. In case of you request 2, the job will never start, as the scheduler can't allocate 2 slots with allocation rule set to 8.
>From man sge_pe: "If the number of tasks specified with the "-pe" option (see qsub(1)) does not divide without remainder by this <int> the job will not be scheduled. " So, the fact that the job in int_test never starts if it requests 2 cores - is totally fine from the scheduler point of view. ## In regard of this issue in general: just wondering if you, or users on the cluster use '-R y' ( reservation ) option for theirs jobs? I have seen such a behavior, when someone submits a job with a reservation defined. The scheduler reserves slots on the cluster for this big job, and doesn't let new jobs come ( especially in case of runtime is not defined by h_rt ). In this case, there will be no messages in the scheduler log which is confusing some time. Best regards, Mikhail Serkov On Fri, Aug 11, 2017 at 6:41 PM, Michael Stauffer <mgsta...@gmail.com> wrote: > Hi, > > > Below I've dumped relevant configurations. > > Today I created a new PE called "int_test" to test the "integer" > allocation rule. I set it to 16 (16 cores per node), and have also tried 8. > It's been added as a PE to the queues we use. When I try to run to this new > PE however, it *always* fails with the same "PE ...offers 0 slots" error, > even if I can run the same multi-slot job using "unihost" PE at the same > time. I'm not sure if this helps debug or not. > > Another thought - this behavior started happening some time ago more or > less when I tried implementing fairshare behavior. I never seemed to get > fairshare working right. We haven't been able to confirm, but for some > users it seems this "PE 0 slots" issue pops up only after they've been > running other jobs for a little while. So I'm wondering if I've screwed up > fairshare in some way that's causing this odd behavior. > > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users