[GitHub] [flink] HuangZhenQiu commented on a change in pull request #8952: [FLINK-10868][flink-runtime] Add failure rater for resource manager

GitBox Tue, 22 Dec 2020 23:05:11 -0800


HuangZhenQiu commented on a change in pull request #8952:
URL: https://github.com/apache/flink/pull/8952#discussion_r547734136




##########
File path: 
flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java
##########
@@ -67,6 +67,33 @@
                        "for streaming workloads, which may fail if there are 
not enough slots. Note that this configuration option does not take " +
                        "effect for standalone clusters, where how many slots 
are allocated is not controlled by Flink.");
 
+       /**
+        * Defines the maximum number of worker (YARN / Mesos) failures per 
minute before rejecting subsequent worker
+        * requests until the failure rate falls below the maximum. It is to 
quickly catch external dependency caused
+        * workers failure and wait for retry interval before sending new 
request. Be default, -1.0 is set to disable the feature.
+        */
+       public static final ConfigOption<Double> MAXIMUM_WORKERS_FAILURE_RATE = 
ConfigOptions
+               .key("resourcemanager.maximum-workers-failure-rate")
+               .doubleType()
+               .defaultValue(-1.0)

Review comment:
       It was the original value before combining the logic of the retry 
interval. I agree that a reasonable value such as 10/min should be the default 
value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] HuangZhenQiu commented on a change in pull request #8952: [FLINK-10868][flink-runtime] Add failure rater for resource manager

Reply via email to