[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603377#comment-15603377
 ] 

Jun Rao commented on KAFKA-4099:
--------------------------------

[~becket_qin], thanks the explanation. What you described makes sense. So the 
issue is probably not that bad since the log won't be rolled as frequently as I 
thought. In the worse case, if we hit this issue, we may create twice as many 
segments as we ideally want to have in the interim. However, since this is 
relatively rare, we can probably just leave the current implementation as it is.

A related issue is on log retention. Suppose that an app reprocesses data from 
more than 7 days ago. What will happen is that those data will be written to a 
log segment only to be deleted when the log retention thread kicks in, at which 
point, a new segment will be rolled. So, in this case, a log will be rolled as 
frequently as log.retention.check.interval.ms, which defaults to 5 mins. I am 
wondering if we should improve this by configuring 
log.message.timestamp.difference.max.ms to match log.retention.ms. This will 
avoid older data to be unnecessarily written to the log. It will help 
time-based log rolling as well.

> Change the time based log rolling to only based on the message timestamp.
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-4099
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4099
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>             Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior to only based on the message timestamp when the messages are in 
> message format 0.10.0 or above. If the first message in the segment does not 
> have a timetamp, we will fall back to use the wall clock time for log rolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to