[ https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596965#comment-15596965 ]
Jiangjie Qin commented on KAFKA-4099: ------------------------------------- [~junrao] Thanks for the explanation. I agree that it is reasonable to roll the log segment based on create time. However, I have a few concern over using the original proposal: 1. It seems the rareness of replica movement is related to scale. e.g. today we have over 1800 brokers at LI and 1-2 brokers die every day. So partition reassignment almost happen every day. So I think there is a difference between "rare at small scale" and "rare regardless of scale". 2. The incorrect create time does not only happen when partition movement occurs. It seems most linux does not have a create time for the files. So the create time of a segment would be lost when the brokers are rebooted. Actually after thinking about the case of oscillating timestamp again, I am not sure if that would actually cause frequent log rolling or not. Let's say we have two producers one producing messages with current timestamp. The other one is producing with timestamps of 7 days old. Assume the current active segment is segment 0 and the current time is T. Because the log rolling is based on the timestamp of the first message in a log segment, it is possible that the first timestamp in segment 0 is 7 days ago (T - 7 days) so once we append a current timestamp T, segment 1 is rolled out and its first timestamp will be T, so segment 1 won't roll immediately like the previous one, i.e. segment 2 will only be rolled out when it sees a timestamp greater than (T + log.roll.ms), and so on. In the above example, it is possible that segment 2 is rolled out because of the segment size. In that case, segment 2 may have the first timestamp of (T - 7days) and segment 3 may get rolled out immediately but segment 3 will again wait until either the segment is full or it sees a bigger timestamp that triggers the log rolling. So in the worst case, we may roll out two new segments in a row. not sure how bad it would be in terms of performance. Admittedly, if we have some certain timestamp pattern, frequent log rolling may still happen. I am curious did you see any real timestamp pattern that has caused the frequent log rolling? > Change the time based log rolling to only based on the message timestamp. > ------------------------------------------------------------------------- > > Key: KAFKA-4099 > URL: https://issues.apache.org/jira/browse/KAFKA-4099 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > Fix For: 0.10.1.0 > > > This is an issue introduced in KAFKA-3163. When partition relocation occurs, > the newly created replica may have messages with old timestamp and cause the > log segment rolling for each message. The fix is to change the log rolling > behavior to only based on the message timestamp when the messages are in > message format 0.10.0 or above. If the first message in the segment does not > have a timetamp, we will fall back to use the wall clock time for log rolling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)