aplyusnin commented on PR #847: URL: https://github.com/apache/flink-kubernetes-operator/pull/847#issuecomment-2234045952
Thank you for your replies! I don't understand how we can determine if a vertex is a bottleneck without evaluating its parallelism. This is why TRUE_PROCESSING_RATE is used. Also, I think that using the simpler approach is not accurate enough. Suppose we have a window join operator of two upstreams. It's target_data_rate is calculated as: `target_data_rate_join = target_data_rate_upstream_1 + target_data_rate_upstream_2` (output ratios are 1 for simplicity). If the join operator is a bottleneck, then it's `actual_target_data_rate_join` is lower than `target_data_rate_join`. Then, by using the backpropagation rule, the new actual_target_data_rates of upstream_1 and upstream_2 are limited by `actual_target_data_rate_join`. This is where problems with accuracy appear. The `actual_target_data_rate_join` still can be greater than `target_data_rate_upstream_1` or `target_data_rate_upstream_2`. It means that the upstreams' target_data_rate remains unchanged. Also, the `actual_target_data_rate_join` can be less than the target_data_rate of upstream, making them equal to `actual_target_data_rate_join`. But then the target_data_rate of the join will be two times greater than it was expected. In both cases, the upstream_1 and upstream_2 operators will remain blocked after scaling. This is why the simpler approach may not be good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org