pnowojski commented on a change in pull request #14968: URL: https://github.com/apache/flink/pull/14968#discussion_r580080652
########## File path: flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/SourceStreamTaskTest.java ########## @@ -856,4 +887,41 @@ private void output(String record) { output.collect(new StreamRecord<>(record)); } } + + /** + * This source sleeps a little bit before processing cancellation and records whether it was + * interrupted by the {@link SourceStreamTask} or not. + */ + private static class WasInterruptedTestingSource implements SourceFunction<String> { + private static final int MAX_POST_CANCEL_ITERATIONS = 100; + private static final long serialVersionUID = 1L; + private volatile boolean running; + + public static boolean wasInterrupted; + + @Override + public void run(SourceContext<String> ctx) throws Exception { + wasInterrupted = false; + try { + int postCancelIterations = 0; + while (running || postCancelIterations < MAX_POST_CANCEL_ITERATIONS) { Review comment: Yes, even without the sleep test can have false negative from time to time. Since this require unexpected sleep/hiccup of 100ms (or more), it looks like chances of this happening are pretty low (`<10%` ?), so the test would catch a regression quite quickly. This test checks: 1. we have a source that ignores for `x` milliseconds cancellation 2. checks that within those `x` ms source thread was not interrupted. `notifyCheckpointCompleteAsync`? Did you mean `StreamTask#notifyCheckpointComplete` -> `CheckpointListener#notifyCheckpointComplete`? Exiting on `CheckpointListener#notifyCheckpointComplete` would still require a sleep to check if the interruption was not called after it. Also in the stop with savepoint case, both `CheckpointListener#notifyCheckpointComplete` and `SourceFunction#cancel` are both being called from `StreamTask#notifyCheckpointComplete` one after another. As a matter of fact, `notifyCheckpointComplete()` is called before `cancel()`, so it would require even so slightly longer/more `MAX_POST_CANCEL_ITERATIONS`. I don't see how can we reliably test this without some sleep based probability window, because for that to be possible we would need to have a notification hook that is always triggered **after** interruption, so that the source would know if the interruption was supposed to happen it would have already happened. But we don't have anything like that. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org