[ https://issues.apache.org/jira/browse/KAFKA-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045748#comment-16045748 ]
ASF GitHub Bot commented on KAFKA-5427: --------------------------------------- GitHub user hachikuji opened a pull request: https://github.com/apache/kafka/pull/3297 KAFKA-5427: Transactional producer should allow FindCoordinator in error state You can merge this pull request into a Git repository by running: $ git pull https://github.com/hachikuji/kafka KAFKA-5427 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3297 ---- commit 76ffa96261084474cde0db350b96d8e4df9cbc55 Author: Jason Gustafson <ja...@confluent.io> Date: 2017-06-10T23:31:32Z KAFKA-5427: Transactional producer should allow FindCoordinator in error state ---- > Transactional producer cannot find coordinator when trying to abort > transaction after error > ------------------------------------------------------------------------------------------- > > Key: KAFKA-5427 > URL: https://issues.apache.org/jira/browse/KAFKA-5427 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Blocker > Fix For: 0.11.0.0 > > > It can happen that we receive an abortable error while we are already > aborting a transaction. In this case, we have an EndTxnRequest queued for > sending when we transition to ABORTABLE_ERROR. It could be that we need to > find the coordinator before sending this EndTxnRequest. The problem is that > we will fail even the FindCoordinatorRequest because we are in an error > state. This causes the following endless loop: > {code} > [2017-06-10 19:29:33,436] DEBUG [TransactionalId my-first-transactional-id] > Enqueuing transactional request (type=FindCoordinatorRequest, > coordinatorKey=my-fi > rst-transactional-id, coordinatorType=TRANSACTION) > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,436] DEBUG [TransactionalId my-first-transactional-id] > Enqueuing transactional request (type=EndTxnRequest, > transactionalId=my-first-tran > sactional-id, producerId=1000, producerEpoch=0, result=ABORT) > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,536] TRACE [TransactionalId my-first-transactional-id] > Not sending transactional request (type=FindCoordinatorRequest, > coordinatorKey=my- > first-transactional-id, coordinatorType=TRANSACTION) because we are in an > error state (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,637] TRACE [TransactionalId my-first-transactional-id] > Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, > producerId > =1000, producerEpoch=0, result=ABORT) dequeued for sending > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,637] DEBUG [TransactionalId my-first-transactional-id] > Enqueuing transactional request (type=FindCoordinatorRequest, > coordinatorKey=my-fi > rst-transactional-id, coordinatorType=TRANSACTION) > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,637] DEBUG [TransactionalId my-first-transactional-id] > Enqueuing transactional request (type=EndTxnRequest, > transactionalId=my-first-tran > sactional-id, producerId=1000, producerEpoch=0, result=ABORT) > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,737] TRACE [TransactionalId my-first-transactional-id] > Not sending transactional request (type=FindCoordinatorRequest, > coordinatorKey=my- > first-transactional-id, coordinatorType=TRANSACTION) because we are in an > error state (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,837] TRACE [TransactionalId my-first-transactional-id] > Request (type=EndTxnRequest, transactionalId=my-first-transactional-id, > producerId > =1000, producerEpoch=0, result=ABORT) dequeued for sending > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,838] DEBUG [TransactionalId my-first-transactional-id] > Enqueuing transactional request (type=FindCoordinatorRequest, > coordinatorKey=my-fi > rst-transactional-id, coordinatorType=TRANSACTION) > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,838] DEBUG [TransactionalId my-first-transactional-id] > Enqueuing transactional request (type=EndTxnRequest, > transactionalId=my-first-tran > sactional-id, producerId=1000, producerEpoch=0, result=ABORT) > (org.apache.kafka.clients.producer.internals.TransactionManager) > [2017-06-10 19:29:33,938] TRACE [TransactionalId my-first-transactional-id] > Not sending transactional request (type=FindCoordinatorRequest, > coordinatorKey=my- > first-transactional-id, coordinatorType=TRANSACTION) because we are in an > error state (org.apache.kafka.clients.producer.internals.TransactionManager) > {code} > A couple suggested improvements: > 1. We should allow FindCoordinator requests regardless of the transaction > state. > 2. It is a bit confusing that we allow EndTxnRequest to be sent in both the > ABORTABLE_ERROR and the ABORTING_TRANSACTION states. Perhaps we should only > allow EndTxnRequest to be sent in ABORTING_TRANSACTION. If we hit an > abortable error and we are already aborting, then we should just stay in > ABORTING_TRANSACTION and perhaps log a warning. -- This message was sent by Atlassian JIRA (v6.3.15#6346)