Re: nulls found in topic, created by recovery?

Neha Narkhede Thu, 06 Nov 2014 14:19:00 -0800

IIRC, the bug that introduced the nulls was related to compressed data. Is
this topic compressed? Did you try to run a consumer through the topic's
data or alternately the DumpLogSegments tool?


On Thu, Nov 6, 2014 at 12:56 PM, Neil Harkins <nhark...@gmail.com> wrote:

> Hi all. I saw something weird yesterday on our "leaf" instances
> which run kafka 0.7.2 (and mirror to kafka 0.8 via our custom code).
> I fully realize everyone's instinctual response is "upgrade, already.",
> but I'd like to have an internals discussion to better understand
> what happened, as I suspect it's still relevant in 0.8.
>
> Basically, in one of our topics there was an 8k stretch of nulls.
> Correlating timestamps from the messages bracketing the nulls
> to the kafka log, I see that the server restarted during that time,
> and here are the recovery lines related to the topic with the nulls:
>
> [2014-11-04 00:48:07,602] INFO zookeeper state changed (SyncConnected)
> (org.I0Itec.zkclient.ZkClient)
> [2014-11-04 01:00:35,806] INFO Shutting down Kafka server
> (kafka.server.KafkaServer)
> [2014-11-04 01:00:35,813] INFO shutdown scheduler kafka-logcleaner-
> (kafka.utils.KafkaScheduler)
> [2014-11-04 01:01:38,411] INFO Starting Kafka server...
> (kafka.server.KafkaServer)
> ...
> [2014-11-04 01:01:49,146] INFO Loading log 'foo.bar-0'
> (kafka.log.LogManager)
> [2014-11-04 01:01:49,147] INFO Loading the last segment
> /var/kafka-leaf-spool/foo.bar-0/00000000002684355423.kafka in mutable
> mode, recovery true (kafka.log.Log)
> [2014-11-04 01:01:55,877] INFO recover high water mark:414004449
> (kafka.message.FileMessageSet)
> [2014-11-04 01:01:55,877] INFO Recovery succeeded in 6 seconds. 0
> bytes truncated. (kafka.message.FileMessageSet)
>
> The only hypothesis I can come up with is that the shutdown
> ("graceful"?) did not wait for all messages to flush to disk
> (we're configured with: log.flush.interval=10000,
> log.default.flush.interval.ms=500, and
> log.default.flush.scheduler.interval.ms=500),
> but the max offset was recorded, so that when it came back up,
> it filled the gap with nulls to reach the valid max offset in case
> any consumers were at the end.
>
> But for consumers with a position prior to all the nulls,
> are they guaranteed to get back "on the rails" so-to-speak?
> Nulls appear as v0(i.e. "magic") messages of 0 length,
> but the messages replaced could be variable length.
>
> Thanks in advance for any input,
> -neil
>

Re: nulls found in topic, created by recovery?

Reply via email to