Dmitry Bugaychenko created KAFKA-1539:
-----------------------------------------

             Summary: Due to OS caching Kafka might loose offset files which 
causes full reset of data
                 Key: KAFKA-1539
                 URL: https://issues.apache.org/jira/browse/KAFKA-1539
             Project: Kafka
          Issue Type: Bug
          Components: log
    Affects Versions: 0.8.1.1
            Reporter: Dmitry Bugaychenko
            Assignee: Jay Kreps


Seen this while testing power failure and disk failures. Due to chaching on OS 
level (eg. XFS can cache data for 30 seconds) after failure we got offset files 
of zero length. This dramatically slows down broker startup (it have to 
re-check all segments) and if high watermark offsets lost it simply erases all 
data and start recovering from other brokers (looks funny - first spending 2-3 
hours re-checking logs and then deleting them all due to missing high 
watermark).

Proposal: introduce offset files rotation. Keep two version of offset file, 
write to oldest, read from the newest valid. In this case we would be able to 
configure offset checkpoint time in a way that at least one file is alway 
flushed and valid.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to