Hi there,

We ran multiple large scale applications YARN clusters, one observation
were those jobs often CPU skewed due to topology or data skew on subtasks.
And for better or worse, the skew leads to a few task managers consuming
large vcores while majority task managers consume much less. Our goal is to
save the total infra budget while keeping the job running smoothly.

Any ongoing discussions in this area? Naively, if we know for sure a few
tasks (uuids) use higher vcore from previous runs, could we request one
last batch of containers with high vcore resource profile and reassign
those tasks?

Thanks,
Chen

Reply via email to