Hi all. I saw something weird yesterday on our "leaf" instances which run kafka 0.7.2 (and mirror to kafka 0.8 via our custom code). I fully realize everyone's instinctual response is "upgrade, already.", but I'd like to have an internals discussion to better understand what happened, as I suspect it's still relevant in 0.8.
Basically, in one of our topics there was an 8k stretch of nulls. Correlating timestamps from the messages bracketing the nulls to the kafka log, I see that the server restarted during that time, and here are the recovery lines related to the topic with the nulls: [2014-11-04 00:48:07,602] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) [2014-11-04 01:00:35,806] INFO Shutting down Kafka server (kafka.server.KafkaServer) [2014-11-04 01:00:35,813] INFO shutdown scheduler kafka-logcleaner- (kafka.utils.KafkaScheduler) [2014-11-04 01:01:38,411] INFO Starting Kafka server... (kafka.server.KafkaServer) ... [2014-11-04 01:01:49,146] INFO Loading log 'foo.bar-0' (kafka.log.LogManager) [2014-11-04 01:01:49,147] INFO Loading the last segment /var/kafka-leaf-spool/foo.bar-0/00000000002684355423.kafka in mutable mode, recovery true (kafka.log.Log) [2014-11-04 01:01:55,877] INFO recover high water mark:414004449 (kafka.message.FileMessageSet) [2014-11-04 01:01:55,877] INFO Recovery succeeded in 6 seconds. 0 bytes truncated. (kafka.message.FileMessageSet) The only hypothesis I can come up with is that the shutdown ("graceful"?) did not wait for all messages to flush to disk (we're configured with: log.flush.interval=10000, log.default.flush.interval.ms=500, and log.default.flush.scheduler.interval.ms=500), but the max offset was recorded, so that when it came back up, it filled the gap with nulls to reach the valid max offset in case any consumers were at the end. But for consumers with a position prior to all the nulls, are they guaranteed to get back "on the rails" so-to-speak? Nulls appear as v0(i.e. "magic") messages of 0 length, but the messages replaced could be variable length. Thanks in advance for any input, -neil