[ https://issues.apache.org/jira/browse/KAFKA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohit Bobade updated KAFKA-17380: --------------------------------- Description: Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once semantics. The processing logic is: consume input records -> intermediate aggregate and buffer data in state store backed by change log topic -> punctuate every 15seconds - flush state store and send aggregated records downstream -> final aggregate operation and send to output topic Since we use spot instances, one of the pod got restarted and rebalance was triggered and state was getting restored from changelog topic. we noticed ProducerFenced exceptions: {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. {quote} After this a few partitions were stuck and no records were processed util we restarted the application. We had configured: transaction.timeout.ms to 30 seconds session.timeout.ms to 30 seconds could you please advise if there's any known fix for this edge case? was: Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once semantics. The processing logic is: consume input records -> intermediate aggregate and buffer data in state store backed by change log topic -> punctuate every 15seconds - flush state store and send aggregated records downstream -> final aggregate operation and send to output topic Since we use spot instances, one of the pod got restarted and rebalance was triggered. we noticed ProducerFenced exceptions: {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. {quote} After this a few partitions were stuck and no records were processed util we restarted the application. We had configured: transaction.timeout.ms to 30 seconds session.timeout.ms to 30 seconds could you please advise if there's any known fix for this edge case? > Kafka Streams few partition stuck in processing - fixed after restart > --------------------------------------------------------------------- > > Key: KAFKA-17380 > URL: https://issues.apache.org/jira/browse/KAFKA-17380 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.6.2 > Reporter: Rohit Bobade > Priority: Major > > Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once > semantics. > The processing logic is: > consume input records -> intermediate aggregate and buffer data in state > store backed by change log topic -> punctuate every 15seconds - flush state > store and send aggregated records downstream -> final aggregate operation and > send to output topic > Since we use spot instances, one of the pod got restarted and rebalance was > triggered and state was getting restored from changelog topic. > we noticed ProducerFenced exceptions: > {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer > attempted an > operation with an old epoch. Either there is a newer producer with the same > transactionalId, or the producer's transaction has been expired by the broker. > {quote} > After this a few partitions were stuck and no records were processed util we > restarted the application. > We had configured: > > transaction.timeout.ms to 30 seconds > session.timeout.ms to 30 seconds > could you please advise if there's any known fix for this edge case? -- This message was sent by Atlassian Jira (v8.20.10#820010)