Re: [PR] [FLINK-31215] [autoscaler] Backpropagate processing rate limits from non-scalable bottlenecks to upstream operators [flink-kubernetes-operator]

via GitHub Wed, 17 Jul 2024 12:06:48 -0700


aplyusnin commented on PR #847:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/847#issuecomment-2234045952


   Thank you for your replies!
   
   I don't understand how we can determine if a vertex is a bottleneck without 
evaluating its parallelism. This is why TRUE_PROCESSING_RATE is used. 
   
   Also, I think that using the simpler approach is not accurate enough. 
Suppose we have a window join operator of two upstreams. It's target_data_rate 
is calculated as:
   
    `target_data_rate_join = target_data_rate_upstream_1 + 
target_data_rate_upstream_2` (output ratios are 1 for simplicity).
   
   If the join operator is a bottleneck, then it's 
`actual_target_data_rate_join` is lower than `target_data_rate_join`. Then, by 
using the backpropagation rule, the new actual_target_data_rates of upstream_1 
and upstream_2 are limited by `actual_target_data_rate_join`.
   
   This is where problems with accuracy appear. 
   
   The `actual_target_data_rate_join` still can be greater than 
`target_data_rate_upstream_1` or `target_data_rate_upstream_2`. It means that 
the upstreams' target_data_rate remains unchanged. 
   
   Also,  the `actual_target_data_rate_join` can be less than the 
target_data_rate of upstream, making them equal to 
`actual_target_data_rate_join`. But then the target_data_rate of the join will 
be two times greater than it was expected.  
   
   In both cases, the upstream_1 and upstream_2 operators will remain blocked 
after scaling. This is why the simpler approach may not be good enough. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-31215] [autoscaler] Backpropagate processing rate limits from non-scalable bottlenecks to upstream operators [flink-kubernetes-operator]

Reply via email to