[ https://issues.apache.org/jira/browse/KAFKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060456#comment-14060456 ]
Dmitry Bugaychenko commented on KAFKA-1539: ------------------------------------------- This is not about log files themselves^ but about chekpoint offset files {code} -rw-r--r-- 1 root root 158 Jul 14 12:11 recovery-point-offset-checkpoint -rw-r--r-- 1 root root 163 Jul 14 12:11 replication-offset-checkpoint -rw-r--r-- 1 root root 0 May 28 13:09 cleaner-offset-checkpoint {code} If recovery-point-offset-checkpoint got corrupted, broker startup slows down dramatically (to hours), if replication-offset-checkpoint got corrupted, then broker removes all the data it has and starts recovering from other replicas. If both got corrupted then you get both - broker spending hours checking log segment files and then removeing them all. > Due to OS caching Kafka might loose offset files which causes full reset of > data > -------------------------------------------------------------------------------- > > Key: KAFKA-1539 > URL: https://issues.apache.org/jira/browse/KAFKA-1539 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.8.1.1 > Reporter: Dmitry Bugaychenko > Assignee: Jay Kreps > > Seen this while testing power failure and disk failures. Due to chaching on > OS level (eg. XFS can cache data for 30 seconds) after failure we got offset > files of zero length. This dramatically slows down broker startup (it have to > re-check all segments) and if high watermark offsets lost it simply erases > all data and start recovering from other brokers (looks funny - first > spending 2-3 hours re-checking logs and then deleting them all due to missing > high watermark). > Proposal: introduce offset files rotation. Keep two version of offset file, > write to oldest, read from the newest valid. In this case we would be able to > configure offset checkpoint time in a way that at least one file is alway > flushed and valid. -- This message was sent by Atlassian JIRA (v6.2#6252)