[ https://issues.apache.org/jira/browse/FLINK-33764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-33764: ----------------------------------- Labels: pull-request-available (was: ) > Incorporate GC / Heap metrics in autoscaler decisions > ----------------------------------------------------- > > Key: FLINK-33764 > URL: https://issues.apache.org/jira/browse/FLINK-33764 > Project: Flink > Issue Type: New Feature > Components: Autoscaler, Kubernetes Operator > Reporter: Gyula Fora > Assignee: Gyula Fora > Priority: Major > Labels: pull-request-available > > The autoscaler currently doesn't use any GC/HEAP metrics as part of the > scaling decisions. > While the long term goal may be to support vertical scaling (increasing TM > sizes) currently this is out of scope for the autoscaler. > However it is very important to detect cases where the throughput of certain > vertices or the entire pipeline is critically affected by long GC pauses. In > these cases the current autoscaler logic would wrongly assume a low true > processing rate and scale the pipeline too high, ramping up costs and causing > further issues. > Using the improved GC metrics introduced in > https://issues.apache.org/jira/browse/FLINK-33318 we should measure the GC > pauses and simply block scaling decisions if the pipeline spends too much > time garbage collecting and notify the user about the required action to > increase memory. -- This message was sent by Atlassian Jira (v8.20.10#820010)