Ben Stopford created KAFKA-2909: ----------------------------------- Summary: Another Instance of Gap in Consumption after Restart Key: KAFKA-2909 URL: https://issues.apache.org/jira/browse/KAFKA-2909 Project: Kafka Issue Type: Sub-task Reporter: Ben Stopford
This seems very similar to Rajini's reported KAFAK-2891 *Context* The context is Seurity Rolling Upgrade with 30s consumer timeout. There was a 2s sleep between restarts. Throughput was limited to 1000 messages per second. *Failure* At least one acked message did not appear in the consumed messages. acked_minus_consumed: set(36802, 36804, 36805, 36807, 36808, 36810, 36811, 64403, 64406, 64409, 36799) Missing data was correctly written to Kafka data files: {quote} value 36802 -> partition 1,offset: 12216 kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files worker7/kafka-data-logs/test_topic-1/00000000000000000000.log | grep 'offset: 12216' -> offset: 12216 position: 374994 isvalid: true payloadsize: 5 magic: 0 compresscodec: NoCompressionCodec crc: 3001177408 in all three data files. So the data is there. {quote} The first missing value was written at: 20:42:30,185, which is around the time the third node goes down. The failed writes correlate with the consumer logging out NOT_COORDINATOR_FOR_GROUP and Marking the coordinator. There are many of these messages though over a long period so it’s hard to infer this as being the cause or specifically correlating with the error. *Timeline* {quote} grep -r 'shutdown complete' * 20:42:06,132 - Node 1 shutdown completed 20:42:18,560 - Node 2 shutdown completed 20:42:30,185 - *Writes that never make it are written by producer* 20:42:31,164 - Node 3 shutdown completed 20:42:57,872 - Node 1 shutdown completed … {quote} All logs for this incident are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)