Nikos Liv created KAFKA-8498:
--------------------------------

             Summary: log-cleaner CorruptRecordException with __consumer_offsets
                 Key: KAFKA-8498
                 URL: https://issues.apache.org/jira/browse/KAFKA-8498
             Project: Kafka
          Issue Type: Bug
          Components: consumer, log
    Affects Versions: 1.0.1
            Reporter: Nikos Liv


Hello,

We have observed the following issue:

We had a java consumer with the same version as the reported kafka (1.0.1), 
this consumer was calling commit.sync() every couple of miliseconds even if 
there is no messages from poll, in fact this was called after the poll timeout, 
this consumer has some low message peaks but most of the time it doesn't 
receive any messages.

By changing the consumer behavior this problem doesn't appear. 

The kafka setup has 3 brokers with a replication factor 2, the disk that is 
used is a ceph block storage device that is exposed as an openstack cider 
volume. 

We noticed that at some point when the log-cleaner thread was trying to compact 
the __consumer_offset for this topic, it failed with:

CorruptRecordException: Record size is less than the minimum record overhead 
(14)

This was causing the log-cleaner to stop, filling up the available free disk 
space and causing kafka to stop working failing the whole system.

Is any known issues similar to this case?

Is it possible that this type of consumer behavior can cause such an issue?

It appears that the consumer will send data when we call commit sync, even if 
it didn't receive any messages, what is the behavior for this cases?

Is it possible for a consumer to send a message to kafka that is corrupted or 
for kafka to corrupt a message on disk or during replication?

Please provide some guidelines about any actions that are needed to 
troubleshoot.

Thanks in advance for your effort.

Br,

Nikos



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to