[ https://issues.apache.org/jira/browse/KAFKA-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Howard updated KAFKA-1394: ------------------------------- Attachment: unflushed_message_expire.patch > Ensure last segment isn't deleted on expiration when there are unflushed > messages > --------------------------------------------------------------------------------- > > Key: KAFKA-1394 > URL: https://issues.apache.org/jira/browse/KAFKA-1394 > Project: Kafka > Issue Type: Improvement > Components: log > Affects Versions: 0.8.0 > Reporter: Nick Howard > Assignee: Jay Kreps > Priority: Minor > Attachments: unflushed_message_expire.patch > > > We have observed that Kafka will sometimes flush messages to a file that is > immediately deleted due to expiration. This happens because the LogManager's > predicate for deleting expired segments is based on the file system modified > time. The modified time reflects the last time messages were flushed to disk, > so when there are messages waiting to be flushed, those are not considered in > the current cleanup strategy. When the last segment is expired, but has > unflushed messages, the deleteOldSegments method will do a roll, then delete > all the segments. Rolls begin by flushing to the last segment, so the > unflushed messages are flushed, then deleted. > It looks like this: > * messages appended, but not enough to trigger a flush > * LogManager begins cleaning expired logs > * predicate checks modified time of last segment -- it's too old > * since all segments are old, it does a roll > * messages flushed to last segment > * last segment deleted > If this happens in between consumer reads, the messages will never be seen > downstream. > Patch: > The patch changes the deletion logic so that if the log has unflushed > messages, the last segment will not be deleted. It widens the lock > sychronization back to where is was earlier to prevent a race condition > between deciding to delete the last segment and an append coming in during > the expired segment clean up and causing unflushed messages that then hit the > issue. > I've also got a backport for 0.7 -- This message was sent by Atlassian JIRA (v6.2#6252)