Aleksandr Savonin created FLINK-39218:
-----------------------------------------
Summary: Add CLI tool to manage lingering Kafka transactions
Key: FLINK-39218
URL: https://issues.apache.org/jira/browse/FLINK-39218
Project: Flink
Issue Type: Improvement
Components: Connectors / Kafka
Reporter: Aleksandr Savonin
When a Flink job using the KafkaSink with EXACTLY_ONCE delivery guarantee stops
unexpectedly (e.g., crash), it can leave Kafka transactions in an ONGOING
state. These lingering transactions block downstream consumers operating in
read_committed isolation level, as the Last Stable Offset (LSO) cannot advance
past them.
Currently, there is no built-in way to resolve this without restarting the
original Flink job or waiting for the transaction timeout.
This ticket adds a dedicated CLI tool (flink-connector-kafka-transaction-tool)
packaged as a self-contained uber-jar that allows operators to manually resolve
stuck transactions:
* Abort: Connects with the same transactional.id to fence the previous
producer, forcing the broker to abort the open transaction.
* Commit: Resumes a specific transaction using the exact producerId and epoch
from Flink checkpoint state/logs and commits it.
The tool is implemented as a new Maven module
(flink-connector-kafka-transaction-tool).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)