Dong Lin created KAFKA-1565: ------------------------------- Summary: Transaction manager failover handling Key: KAFKA-1565 URL: https://issues.apache.org/jira/browse/KAFKA-1565 Project: Kafka Issue Type: New Feature Reporter: Dong Lin
Transaction manager should guarantee that, once a pre-commit/pre-abort request is acknowledged, commit/abort request will be delivered to partitions involved in the transaction. In particular, we handle the following failover scenarios: 1) Transaction manager or its followers fail before txRequest is duplicated on local log and followers. Solution: Transaction manager responds to request with error status if it is alive. The producer keeps trying commit. 2) The txPartition’s leader is not available. Solution: Put txRequest on unSentTxRequestQueue. When metadataCache is updated, check and re-send txRequest from unSentTxRequestQueue if possible. 3) The txPartition’s leader fails when txRequest is in channel manager. Solution: Retrieve all txRequests queued for transmission to this broker and put them on unSentTxRequestQueue. 4) Transaction manage does not receive success response from txPartition’s leaders within timeout period. Solution: Transaction manager expires the txRequest and re-send it. 5) Transaction manager fails. Solution: The new transaction manager reads transactionHW from zookeeper, and sends txRequest starting from the transactionHW. -- This message was sent by Atlassian JIRA (v6.2#6252)