Patrick Pang created KAFKA-15792: ------------------------------------ Summary: Kafka Streams stuck partition fixed after restarting the process Key: KAFKA-15792 URL: https://issues.apache.org/jira/browse/KAFKA-15792 Project: Kafka Issue Type: New Feature Components: streams Affects Versions: 3.1.2 Reporter: Patrick Pang
Our Kafka Streams process often show slow in processing a particular partition on a specific instance. No data skew is detected, i.e. partitions are mostly uniformly distributed. Symptom is huge lag on a specific partition. After restarting the process, the lag drains within 5 minutes after startup. This hints at internal processing issue of our streams application instead of cluster or poison message. Is there any metrics you suggest for us to look at, or is this a known issue? Regularly bouncing the application doesn't look like a proper fix for production systems. -- This message was sent by Atlassian Jira (v8.20.10#820010)