1996fanrui commented on code in PR #726: URL: https://github.com/apache/flink-kubernetes-operator/pull/726#discussion_r1423346604
########## flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java: ########## @@ -222,6 +222,18 @@ private static ConfigOptions.OptionBuilder autoScalerConfig(String key) { .withDescription( "Processing rate increase threshold for detecting ineffective scaling threshold. 0.1 means if we do not accomplish at least 10% of the desired capacity increase with scaling, the action is marked ineffective."); + public static final ConfigOption<Double> GC_PRESSURE_THRESHOLD = + autoScalerConfig("memory.gc-pressure.threshold") + .doubleType() + .defaultValue(0.3) + .withDescription("Max allowed GC pressure during scaling operations"); + + public static final ConfigOption<Double> HEAP_USAGE_THRESHOLD = + autoScalerConfig("memory.heap-usage.threshold") + .doubleType() + .defaultValue(0.9) Review Comment: > Also keep in mind that this is the average heap usage. With 90% average usage you are extremely likely to be close to out of heap in most cases. Thanks @gyfora for the clarification! I guess it's not average heap usage, and I wanna check with you first. In the `ScalingExecutor#isJobUnderMemoryPressure` method, we check whether `evaluatedMetrics.get(ScalingMetric.HEAP_USAGE).getAverage()` > `conf.get(AutoScalerOptions.HEAP_USAGE_THRESHOLD)`. Intuitively `getAverage` looks like the average, but its calculation is divided into two steps: - Step1: `ScalingMetrics#computeGlobalMetrics` collect the `HEAP_USAGE` for each time, it's `heapUsed.getMax() / heapMax.getMax()`. - IIUC, the `heapUsed` is `AggregatedMetric`, when one job has 1000 taskmanagers, if the heapUsed for 999 tms is very low, and only one tm is high, we think `heapUsed` is high as this time. - Step2: `ScalingMetricEvaluator#evaluateGlobalMetrics` compute the `HEAP_USAGE` based on `metricHistory`. - The `metricHistory` is composed of TMs with the highest heapUsage at a large number of time points. Strictly speaking, both of 2 steps have some problems: - Step1: Java GC is executed lazily, not immediately. - When TM heapUsage is high, it may be that the GC has not been triggered, which does not mean that the memory pressure is high. - Especially if the heapUsage is high for only one TM or a small number of TMs. - Step2: Since the data in the first step is unreliable, the average value in the second step is unreliable. > GC metrics will only be available in Flink 1.19. I'm not sure can we sum all GC times as the total gc times? Before 1.19, it has detailed GC times for each GC. > This is a very good point and happens often. I think we could definitely build this logic on top of the newly introduced metrics + scaling history as a follow up. It would probably be a very good addition. (but definitely out of scope for this PR) Sounds make sense, as I understand: it's better to revert this scaling if job is unhealthy after scale down. The memory pressure is one type of unhealthy. Checkpoint fails or CPU pressure may be unhealthy as well. Would you mind if I create one JIRA and pick it up? Thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org