The topic is not compressed. The consumer used our fork of the python lib,
which I had to modify to get over the nulls.

-neil


On Thu, Nov 6, 2014 at 2:16 PM, Neha Narkhede <neha.narkh...@gmail.com> wrote:
> IIRC, the bug that introduced the nulls was related to compressed data. Is
> this topic compressed? Did you try to run a consumer through the topic's
> data or alternately the DumpLogSegments tool?
>
> On Thu, Nov 6, 2014 at 12:56 PM, Neil Harkins <nhark...@gmail.com> wrote:
>>
>> Hi all. I saw something weird yesterday on our "leaf" instances
>> which run kafka 0.7.2 (and mirror to kafka 0.8 via our custom code).
>> I fully realize everyone's instinctual response is "upgrade, already.",
>> but I'd like to have an internals discussion to better understand
>> what happened, as I suspect it's still relevant in 0.8.
>>
>> Basically, in one of our topics there was an 8k stretch of nulls.
>> Correlating timestamps from the messages bracketing the nulls
>> to the kafka log, I see that the server restarted during that time,
>> and here are the recovery lines related to the topic with the nulls:
>>
>> [2014-11-04 00:48:07,602] INFO zookeeper state changed (SyncConnected)
>> (org.I0Itec.zkclient.ZkClient)
>> [2014-11-04 01:00:35,806] INFO Shutting down Kafka server
>> (kafka.server.KafkaServer)
>> [2014-11-04 01:00:35,813] INFO shutdown scheduler kafka-logcleaner-
>> (kafka.utils.KafkaScheduler)
>> [2014-11-04 01:01:38,411] INFO Starting Kafka server...
>> (kafka.server.KafkaServer)
>> ...
>> [2014-11-04 01:01:49,146] INFO Loading log 'foo.bar-0'
>> (kafka.log.LogManager)
>> [2014-11-04 01:01:49,147] INFO Loading the last segment
>> /var/kafka-leaf-spool/foo.bar-0/00000000002684355423.kafka in mutable
>> mode, recovery true (kafka.log.Log)
>> [2014-11-04 01:01:55,877] INFO recover high water mark:414004449
>> (kafka.message.FileMessageSet)
>> [2014-11-04 01:01:55,877] INFO Recovery succeeded in 6 seconds. 0
>> bytes truncated. (kafka.message.FileMessageSet)
>>
>> The only hypothesis I can come up with is that the shutdown
>> ("graceful"?) did not wait for all messages to flush to disk
>> (we're configured with: log.flush.interval=10000,
>> log.default.flush.interval.ms=500, and
>> log.default.flush.scheduler.interval.ms=500),
>> but the max offset was recorded, so that when it came back up,
>> it filled the gap with nulls to reach the valid max offset in case
>> any consumers were at the end.
>>
>> But for consumers with a position prior to all the nulls,
>> are they guaranteed to get back "on the rails" so-to-speak?
>> Nulls appear as v0(i.e. "magic") messages of 0 length,
>> but the messages replaced could be variable length.
>>
>> Thanks in advance for any input,
>> -neil
>
>

Reply via email to