[ https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603377#comment-15603377 ]
Jun Rao commented on KAFKA-4099: -------------------------------- [~becket_qin], thanks the explanation. What you described makes sense. So the issue is probably not that bad since the log won't be rolled as frequently as I thought. In the worse case, if we hit this issue, we may create twice as many segments as we ideally want to have in the interim. However, since this is relatively rare, we can probably just leave the current implementation as it is. A related issue is on log retention. Suppose that an app reprocesses data from more than 7 days ago. What will happen is that those data will be written to a log segment only to be deleted when the log retention thread kicks in, at which point, a new segment will be rolled. So, in this case, a log will be rolled as frequently as log.retention.check.interval.ms, which defaults to 5 mins. I am wondering if we should improve this by configuring log.message.timestamp.difference.max.ms to match log.retention.ms. This will avoid older data to be unnecessarily written to the log. It will help time-based log rolling as well. > Change the time based log rolling to only based on the message timestamp. > ------------------------------------------------------------------------- > > Key: KAFKA-4099 > URL: https://issues.apache.org/jira/browse/KAFKA-4099 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > Fix For: 0.10.1.0 > > > This is an issue introduced in KAFKA-3163. When partition relocation occurs, > the newly created replica may have messages with old timestamp and cause the > log segment rolling for each message. The fix is to change the log rolling > behavior to only based on the message timestamp when the messages are in > message format 0.10.0 or above. If the first message in the segment does not > have a timetamp, we will fall back to use the wall clock time for log rolling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)