[ https://issues.apache.org/jira/browse/KAFKA-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810946#comment-16810946 ]
ASF GitHub Bot commented on KAFKA-7192: --------------------------------------- guozhangwang commented on pull request #6546: KAFKA-7192: Cherry-pick 5430 to 1.1 URL: https://github.com/apache/kafka/pull/6546 The first PR of KAFKA-7192 is cherry-picked to 1.1 but the follow-up (https://github.com/apache/kafka/pull/5430) is not. This is causing flaky EOS system test failures. Some test results: In 2.0 branch, running 25 times (the streams_eos_test has 4 tests, so = 100 tests), no failures: http://confluent-kafka-2-0-system-test-results.s3-us-west-2.amazonaws.com/2019-04-05--001.1554466177--apache--2.0--db22e3d/report.html In 1.1 branch before this PR, running 5 times, failed 10 tests: http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2019-04-02--001.1554239700--guozhangwang--KMinor-1.1-eos-test--8395fce/report.html In this branch (after this PR), running 25 times, no failures: http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2019-04-05--001.1554465488--guozhangwang--KMinor-1.1-eos-test--897aa03/report.html ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > State-store can desynchronise with changelog > -------------------------------------------- > > Key: KAFKA-7192 > URL: https://issues.apache.org/jira/browse/KAFKA-7192 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.11.0.3, 1.0.2, 1.1.1, 2.0.0 > Reporter: Jon Bates > Assignee: Guozhang Wang > Priority: Critical > Labels: bugs > Fix For: 0.11.0.4, 1.0.3, 1.1.2, 2.0.1, 2.1.0 > > > n.b. this bug has been verified with exactly-once processing enabled > Consider the following scenario: > * A record, N is read into a Kafka topology > * the state store is updated > * the topology crashes > h3. *Expected behaviour:* > # Node is restarted > # Offset was never updated, so record N is reprocessed > # State-store is reset to position N-1 > # Record is reprocessed > h3. *Actual Behaviour* > # Node is restarted > # Record N is reprocessed (good) > # The state store has the state from the previous processing > I'd consider this a corruption of the state-store, hence the critical > Priority, although High may be more appropriate. > I wrote a proof-of-concept here, which demonstrates the problem on Linux: > [https://github.com/spadger/kafka-streams-sad-state-store] -- This message was sent by Atlassian JIRA (v7.6.3#76005)