Peter Larsen created FLINK-35990: ------------------------------------ Summary: Lingering Transactions with FlinkKafkaProducer after failures & scale-down Key: FLINK-35990 URL: https://issues.apache.org/jira/browse/FLINK-35990 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.17.2, 1.14.3 Reporter: Peter Larsen
Hi! I’ve recently hit some issues with lingering transactions not getting aborted by FlinkKafkaProducer on 1.14.3. The failure seems to be triggered by a failed restart from a checkpoint, then restarting with lower parallelism. I made a test that I think reproduces the issue and pushed it up to a fork [here|https://github.com/peterdlarsen/flink/compare/peterdlarsen:c0027e5...peterdlarsen:b4c4750]. I also reproduced on a local cluster with 1.14.3 and am happy to share more details if that’s useful! I’m assuming migrating to KafkaSink is the recommended remediation as opposed to fixing, but wanted to report in case it’s helpful to anyone else. -- This message was sent by Atlassian Jira (v8.20.10#820010)