Aleksandr Savonin created FLINK-39218:
-----------------------------------------

             Summary: Add CLI tool to manage lingering Kafka transactions
                 Key: FLINK-39218
                 URL: https://issues.apache.org/jira/browse/FLINK-39218
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Kafka
            Reporter: Aleksandr Savonin


When a Flink job using the KafkaSink with EXACTLY_ONCE delivery guarantee stops 
unexpectedly (e.g., crash), it can leave Kafka transactions in an ONGOING 
state. These lingering transactions block downstream consumers operating in 
read_committed isolation level, as the Last Stable Offset (LSO) cannot advance 
past them.

Currently, there is no built-in way to resolve this without restarting the 
original Flink job or waiting for the transaction timeout.

This ticket adds a dedicated CLI tool (flink-connector-kafka-transaction-tool) 
packaged as a self-contained uber-jar that allows operators to manually resolve 
stuck transactions:
 * Abort: Connects with the same transactional.id to fence the previous 
producer, forcing the broker to abort the open transaction.
 * Commit: Resumes a specific transaction using the exact producerId and epoch 
from Flink checkpoint state/logs and commits it.

The tool is implemented as a new Maven module 
(flink-connector-kafka-transaction-tool).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to