Hi there, We ran multiple large scale applications YARN clusters, one observation were those jobs often CPU skewed due to topology or data skew on subtasks. And for better or worse, the skew leads to a few task managers consuming large vcores while majority task managers consume much less. Our goal is to save the total infra budget while keeping the job running smoothly.
Any ongoing discussions in this area? Naively, if we know for sure a few tasks (uuids) use higher vcore from previous runs, could we request one last batch of containers with high vcore resource profile and reassign those tasks? Thanks, Chen