> > ## > In regard of 'int_test' PE you created. If you set allocation rule to > integer, it would mean that the job _must_ request amount of slots equal or > multiple to this value. > In your case, PE is defined to use '8' as allocation rule, so your job > must request 8 or 16 or 24 ... slots. In case of you request 2, the job > will never start, as the scheduler can't allocate 2 slots with allocation > rule set to 8. > > From man sge_pe: > "If the number of tasks specified with the "-pe" option (see qsub(1)) > does not divide without remainder by this <int> the job will not be > scheduled. " > > So, the fact that the job in int_test never starts if it requests 2 cores > - is totally fine from the scheduler point of view. >
OK, thanks very much, that explains it. I'll test accordingly. > ## > In regard of this issue in general: just wondering if you, or users on the > cluster use '-R y' ( reservation ) option for theirs jobs? I have seen such > a behavior, when someone submits a job with a reservation defined. The > scheduler reserves slots on the cluster for this big job, and doesn't let > new jobs come ( especially in case of runtime is not defined by h_rt ). In > this case, there will be no messages in the scheduler log which is > confusing some time. > I don't think users are using '-R y', but I'm not sure. Do you know how I can tell that? I think 'qstat -g c' shows that in the RES column? I don't think I've ever seen non-zero there, but I'll pay attention. However the stuck-job issue is happening right now to at least one user, and the RES column is all zeros. -M > > Best regards, > Mikhail Serkov > > On Fri, Aug 11, 2017 at 6:41 PM, Michael Stauffer <mgsta...@gmail.com> > wrote: > >> Hi, >> >> >> Below I've dumped relevant configurations. >> >> Today I created a new PE called "int_test" to test the "integer" >> allocation rule. I set it to 16 (16 cores per node), and have also tried 8. >> It's been added as a PE to the queues we use. When I try to run to this new >> PE however, it *always* fails with the same "PE ...offers 0 slots" error, >> even if I can run the same multi-slot job using "unihost" PE at the same >> time. I'm not sure if this helps debug or not. >> >> Another thought - this behavior started happening some time ago more or >> less when I tried implementing fairshare behavior. I never seemed to get >> fairshare working right. We haven't been able to confirm, but for some >> users it seems this "PE 0 slots" issue pops up only after they've been >> running other jobs for a little while. So I'm wondering if I've screwed up >> fairshare in some way that's causing this odd behavior. >> >> >>
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users