[ https://issues.apache.org/jira/browse/KAFKA-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994575#comment-15994575 ]
Michal Borowiecki commented on KAFKA-5155: ------------------------------------------ Hi @huxi, Personally, I feel the similarity is superficial. KAFKA-4398 is about consuming messages in timestamp order, which challenges the current design and basically calls out for a new feature. This ticket on the other hand is reporting a defect, with potential data loss, which violates the at-least-once semantics. However, it does not challenge the design, simply points out that one line of code needs changing to cater for the case when msgs with and without timestamps are appended to the same segment, which IMHO is a non-contentious bugfix. > Messages can be deleted prematurely when some producers use timestamps and > some not > ----------------------------------------------------------------------------------- > > Key: KAFKA-5155 > URL: https://issues.apache.org/jira/browse/KAFKA-5155 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.10.2.0 > Reporter: Petr Plavjaník > > Some messages can be deleted prematurely and never read in following > scenario. A producer uses timestamps and produces messages that are appended > to the beginning of a log segment. Other producer produces messages without a > timestamp. In that case the largest timestamp is made by the old messages > with a timestamp and new messages with the timestamp does not influence and > the log segment with old and new messages can be delete immediately after the > last new message with no timestamp is appended. When all appended messages > have no timestamp, then they are not deleted because {{lastModified}} > attribute of a {{LogSegment}} is used. > New test case to {{kafka.log.LogTest}} that fails: > {code} > @Test > def > shouldNotDeleteTimeBasedSegmentsWhenTimestampIsNotProvidedForSomeMessages() { > val retentionMs = 10000000 > val old = TestUtils.singletonRecords("test".getBytes, timestamp = 0) > val set = TestUtils.singletonRecords("test".getBytes, timestamp = -1, > magicValue = 0) > val log = createLog(set.sizeInBytes, retentionMs = retentionMs) > // append some messages to create some segments > log.append(old) > for (_ <- 0 until 12) > log.append(set) > assertEquals("No segment should be deleted", 0, log.deleteOldSegments()) > } > {code} > It can be prevented by using {{def largestTimestamp = > Math.max(maxTimestampSoFar, lastModified)}} in LogSegment, or by using > current timestamp when messages with timestamp {{-1}} are appended. -- This message was sent by Atlassian JIRA (v6.3.15#6346)