[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski updated FLINK-20654: ----------------------------------- Priority: Critical (was: Blocker) > Unaligned checkpoint recovery may lead to corrupted data stream > --------------------------------------------------------------- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.12.0, 1.12.1 > Reporter: Arvid Heise > Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.12.2, 1.13.0 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;100000] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)