Hello,
Recently, in one of our clusters we noticed production jobs go to
PENDING state, due to insufficient CPU. The non production jobs are
not preempted, as we haven't used --preemption_delay flag for
scheduler. The default value for this flag is 10mins. Why is it too
high? Is there any reasoning behind using 10mins as a default value?

We are thinking to to use 2mins for this flag. We wouldn't want to
wait beyond 2mins to run a prod job during resource constraint. Does
it sound reasonable? What's the typical preemption delay used by SREs?

-- 
Regards,
Bhuvan Arumugam
www.livecipher.com

Reply via email to