[ 
https://issues.apache.org/jira/browse/KAFKA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Bobade updated KAFKA-17380:
---------------------------------
    Description: 
Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once 
semantics.

The processing logic is: 

consume input records -> intermediate aggregate and buffer data in state store 
backed by change log topic -> punctuate every 15seconds - flush state store and 
send aggregated records downstream -> final aggregate operation and send to 
output topic

Since we use spot instances, one of the pod got restarted and rebalance was 
triggered and state was getting restored from changelog topic.

we noticed ProducerFenced exceptions:
{quote}org.apache.kafka.common.errors.ProducerFencedException: Producer 
attempted an

operation with an old epoch. Either there is a newer producer with the same 
transactionalId, or the producer's transaction has been expired by the broker.
{quote}
After this a few partitions were stuck and no records were processed util we 
restarted the application.

We had configured:
 
transaction.timeout.ms to 30 seconds

session.timeout.ms to 30 seconds

could you please advise if there's any known fix for this edge case? 

  was:
Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once 
semantics.

The processing logic is: 

consume input records -> intermediate aggregate and buffer data in state store 
backed by change log topic -> punctuate every 15seconds - flush state store and 
send aggregated records downstream -> final aggregate operation and send to 
output topic



Since we use spot instances, one of the pod got restarted and rebalance was 
triggered.

we noticed ProducerFenced exceptions:


{quote}org.apache.kafka.common.errors.ProducerFencedException: Producer 
attempted an

operation with an old epoch. Either there is a newer producer with the same 
transactionalId, or the producer's transaction has been expired by the broker.
{quote}

After this a few partitions were stuck and no records were processed util we 
restarted the application.


We had configured:
 
transaction.timeout.ms to 30 seconds

session.timeout.ms to 30 seconds



could you please advise if there's any known fix for this edge case? 


> Kafka Streams few partition stuck in processing - fixed after restart
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-17380
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17380
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.2
>            Reporter: Rohit Bobade
>            Priority: Major
>
> Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once 
> semantics.
> The processing logic is: 
> consume input records -> intermediate aggregate and buffer data in state 
> store backed by change log topic -> punctuate every 15seconds - flush state 
> store and send aggregated records downstream -> final aggregate operation and 
> send to output topic
> Since we use spot instances, one of the pod got restarted and rebalance was 
> triggered and state was getting restored from changelog topic.
> we noticed ProducerFenced exceptions:
> {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer 
> attempted an
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> {quote}
> After this a few partitions were stuck and no records were processed util we 
> restarted the application.
> We had configured:
>  
> transaction.timeout.ms to 30 seconds
> session.timeout.ms to 30 seconds
> could you please advise if there's any known fix for this edge case? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to