In several jobs with relatively long pipelines, I'm noticing the autoscaler
is leading to very unbalanced work distribution, to the point the job works
much better by simply disabling it.

For example, the attached screenshot shows the TM CPU usage. As you can
see, one TM was doing most of the work until I disabled the autoscaler,
then the work got evenly distributed to the three TMs (note, the number of
TMs is kept constant). I can share my autoscaler settings if that might
help with the troubleshooting, but in the meantime I have a couple of
related questions:

1. Is it possible to restrict the autoscaling decisions to those altering
the number of TMs? When scaling down for example, I want to lower the
number of replicas, otherwise the downscaling does not bring me much.

2. Is it possible to make the autoscaler work with a global (pipeline-wise)
scaling factor vs one per vertex?

The combination of 1 & 2 would autoscale jobs by either adding/removing
replicas, which is what I've typically done by hand. Does it make sense or
maybe I'm missing something?

Regards,

Salva

Reply via email to