Jason Gustafson created KAFKA-14830:
---------------------------------------

             Summary: Illegal state error in transactional producer
                 Key: KAFKA-14830
                 URL: https://issues.apache.org/jira/browse/KAFKA-14830
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.1.2
            Reporter: Jason Gustafson


We have seen the following illegal state error in the producer:
{code:java}
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.TimeoutException: 
Expiring 1 record(s) for topic-0:120027 ms has passed since batch creation
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.TimeoutException: 
Expiring 1 record(s) for topic-1:120026 ms has passed since batch creation
[Producer clientId=client-id2, transactionalId=transactional-id] Aborting 
incomplete transaction
[Producer clientId=client-id2, transactionalId=transactional-id] Invoking 
InitProducerId with current producer ID and epoch 
ProducerIdAndEpoch(producerId=191799, epoch=0) in order to bump the epoch
[Producer clientId=client-id2, transactionalId=transactional-id] ProducerId set 
to 191799 with epoch 1
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.NetworkException: 
Disconnected from node 4
[Producer clientId=client-id2, transactionalId=transactional-id] Transiting to 
abortable error state due to org.apache.kafka.common.errors.TimeoutException: 
The request timed out.
[Producer clientId=client-id2, transactionalId=transactional-id] Uncaught error 
in request completion:
java.lang.IllegalStateException: TransactionalId transactional-id: Invalid 
transition attempted from state READY to state ABORTABLE_ERROR
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:1089)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:508)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:734)
        at 
org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:739)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:753)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743)
        at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695)
        at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634)
        at 
org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
        at 
org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562)
        at 
org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836)
        at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
        at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:583)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:575)
        at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
        at java.base/java.lang.Thread.run(Thread.java:829)
 {code}
The producer hits timeouts which cause it to abort an active transaction. After 
aborting, the producer bumps its epoch, which transitions it back to the 
`READY` state. Following this, there are two errors for inflight requests, 
which cause an illegal state transition to `ABORTABLE_ERROR`. But how could the 
transaction ABORT complete if there were still inflight requests? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to