[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339537#comment-17339537 ]
Matthias commented on FLINK-20654: ---------------------------------- [~AHeise] what's the plan with the trace logging being enabled for UnalignedCheckpoints? Shall we have a Jira issue for disabling it again after a certain amount of time? I came across it when investigating the file size of the Maven logs (the one I looked at was 7.9G). 25.458.266 out of 26.712.993 lines were due to the trace logging of the {{NetworkActionsLogger}}. > Unaligned checkpoint recovery may lead to corrupted data stream > --------------------------------------------------------------- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.12.0, 1.12.1 > Reporter: Arvid Heise > Assignee: Piotr Nowojski > Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.13.0, 1.12.3 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;100000] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)