Rohit Bobade created KAFKA-17380: ------------------------------------ Summary: Kafka Streams few partition stuck in processing - fixed after restart Key: KAFKA-17380 URL: https://issues.apache.org/jira/browse/KAFKA-17380 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.6.2 Reporter: Rohit Bobade
Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once semantics. The processing logic is: consume input records -> intermediate aggregate and buffer data in state store backed by change log topic -> punctuate every 15seconds - flush state store and send aggregated records downstream -> final aggregate operation and send to output topic Since we use spot instances, one of the pod got restarted and rebalance was triggered. we noticed ProducerFenced exceptions: {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. {quote} After this a few partitions were stuck and no records were processed util we restarted the application. We had configured: transaction.timeout.ms to 30 seconds session.timeout.ms to 30 seconds could you please advise if there's any known fix for this edge case? -- This message was sent by Atlassian Jira (v8.20.10#820010)