Petr Plavjaník created KAFKA-5155: ------------------------------------- Summary: Messages can be deleted prematurely when some producers use timestamps and some not Key: KAFKA-5155 URL: https://issues.apache.org/jira/browse/KAFKA-5155 Project: Kafka Issue Type: Bug Components: log Affects Versions: 0.10.2.0 Reporter: Petr Plavjaník
Some messages can be deleted prematurely and never read in following scenario. A producer uses timestamps and produces messages that are appended to the beginning of a log segment. Other producer produces messages without a timestamp. In that case the largest timestamp is made by the old messages with a timestamp and new messages with the timestamp does not influence and the log segment with old and new messages can be delete immediately after the last new message with no timestamp is appended. When all appended messages have no timestamp, then they are not deleted because {{lastModified}} attribute of a {{LogSegment}} is used. New test case to {{kafka.log.LogTest}} that fails: {code} @Test def shouldNotDeleteTimeBasedSegmentsWhenTimestampIsNotProvidedForSomeMessages() { val retentionMs = 10000000 val old = TestUtils.singletonRecords("test".getBytes, timestamp = 0) val set = TestUtils.singletonRecords("test".getBytes, timestamp = -1) val log = createLog(set.sizeInBytes, retentionMs = retentionMs) // append some messages to create some segments log.append(old) for (_ <- 0 until 14) log.append(set) log.deleteOldSegments() assertEquals("There should be 3 segments remaining", 3, log.numberOfSegments) } {code} It can be prevented by using {{def largestTimestamp = Math.max(maxTimestampSoFar, lastModified)}} in LogSegment, or by using current timestamp when messages with timestamp {{-1}} are appended. -- This message was sent by Atlassian JIRA (v6.3.15#6346)