[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340073#comment-17340073 ]
Piotr Nowojski commented on FLINK-20654: ---------------------------------------- [~mapohl] and [~trohrmann], could you elaborate what's the problem exactly? Why do you need to analyse UCITcase/UCStressITCase/UCRescaingITCase logs? > Unaligned checkpoint recovery may lead to corrupted data stream > --------------------------------------------------------------- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.12.0, 1.12.1 > Reporter: Arvid Heise > Assignee: Piotr Nowojski > Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.13.0, 1.12.3 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;100000] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)