Luke Chen created KAFKA-20716:
---------------------------------

             Summary: LSO stuck after unclean leader election
                 Key: KAFKA-20716
                 URL: https://issues.apache.org/jira/browse/KAFKA-20716
             Project: Kafka
          Issue Type: Bug
            Reporter: Luke Chen


When a topic has an unclean leader election, the new leader might contain txn 
data without COMMIT/ABORT markers. However, the data in __transaction_state 
shows the transaction is committed/aborted, so the transaction timeout will not 
expire here. This causes the LSO stuck and READ_COMMITTED will never proceed.

 

reproduce steps:

1. Create a cluster with 2 brokers

2. Create a topic with unclean leader election enabled
{code:java}
bin/kafka-topics.sh --create --topic t1 --bootstrap-server localhost:9091 
--replication-factor 2 --config unclean.leader.election.enable=true {code}
3. write a txn record to the topic t1, but wait for 10 seconds before 
committing it.

4. Before the record committed in step (3), shutdown the follower broker 
(suppose it's broker 2)

5. Now, the the topic t1-0 in broker 1 contains [offset 0 (data) and offset 1 
(commit)], but broker 2 only contains [offset 0 (data)]

6. shutdown broker 1, so both broker 1 and 2 are down, but broker 2 is not the 
last leader or ELR

7. start up broker 2, unclean leader election triggered

8. start up broker 1, log truncation on t1-0, so the log becomes [offset 0 
(data)]

9. appending more non-txn data to t1-0

10. consume with READ_COMMITTED, it'll return nothing.

 

We never document anywhere about unclean leader election support in transaction 
feature, I think this should be supported and we have to find out a solution 
for it.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to