1996fanrui commented on code in PR #24003: URL: https://github.com/apache/flink/pull/24003#discussion_r1456812210
########## flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/failover/ExecutionFailureHandlerTest.java: ########## @@ -183,6 +194,55 @@ void testNonRecoverableFailureHandlingResult() throws Exception { assertThat(executionFailureHandler.getNumberOfRestarts()).isZero(); } + /** Test isNewAttempt of {@link FailureHandlingResult} is expected. */ + @Test + void testNewAttemptAndNumberOfRestarts() throws Exception { Review Comment: Good suggestion! Also, I extracted the `assertHandlerRootException` and `assertHandlerConcurrentException` methods, after that, the `testNewAttemptAndNumberOfRestarts` is totally simple. ########## flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/FixedDelayRestartBackoffTimeStrategy.java: ########## @@ -61,8 +61,9 @@ public long getBackoffTime() { } @Override - public void notifyFailure(Throwable cause) { + public boolean notifyFailure(Throwable cause) { currentRestartAttempt++; + return true; Review Comment: > But from a conceptual point of view: shouldn't we cover it also for the two other restart strategies. Or am I missing something here? In the beginning, I want to improve all restart strategies, but I meet some feedback to removing other restart strategies during discussion. The core background is this thread: https://lists.apache.org/thread/l7wyc7pndpsvh2h7hj3fw2td9yphrlox In brief, 3 reasons: 1. The semantics of option - the failure-rate strategy's restart upper limit option is named `restart-strategy.failure-rate.max-failures-per-interval` - It's `max-failures-per-interval` instead of `max-attempts-per-interval`. - If we improve it directly, the name and behaviour aren't matched. 2. We recommend users use the `exponential-delay restart strategy` in the future, it's more powerful. 3. After FLIP-360 discussion, I found `exponential-delay restart strategy` can replace other restart strategies directly if users set the `restart-strategy.exponential-delay.backoff-multiplier` = 1 Actually, I think other restart-strategies can be deprecated in the future. ########## flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/exceptionhistory/RootExceptionHistoryEntryTest.java: ########## @@ -81,10 +81,14 @@ void testFromFailureHandlingResultSnapshot() throws ExecutionException, Interrup final CompletableFuture<Map<String, String>> rootFailureLabels = CompletableFuture.completedFuture(Collections.singletonMap("key", "value")); - final Throwable concurrentException = new IllegalStateException("Expected other failure"); - final ExecutionVertex concurrentlyFailedExecutionVertex = extractExecutionVertex(1); - final long concurrentExceptionTimestamp = - triggerFailure(concurrentlyFailedExecutionVertex, concurrentException); + final Throwable concurrentException1 = new IllegalStateException("Expected other failure1"); + final ExecutionVertex concurrentlyFailedExecutionVertex1 = extractExecutionVertex(1); + Predicate<ExceptionHistoryEntry> exception1Predicate = + getExceptionHistoryEntryPredicate( + concurrentException1, concurrentlyFailedExecutionVertex1); + + final Throwable concurrentException2 = new IllegalStateException("Expected other failure2"); + final ExecutionVertex concurrentlyFailedExecutionVertex2 = extractExecutionVertex(2); Review Comment: Thanks for the explanation! > testAddConecurrentExceptions will use all code of this test If we have 2 tests, and test1 only call `testCommon`, and test2 call `testCommon` and `testPart2`. It means test2 can cover test1. 2 solutions: - keep test1 and test2 - Only keep test2 In the current scenario, is it enough for us to only keep test2? Looking forward to your opinion, fine with me as well. 😁 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org