[ https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596037#comment-15596037 ]
Jun Rao commented on KAFKA-4099: -------------------------------- I had two use cases of time-based rolling in mind. The first one is for users who don't want to retain a message (say sensitive data) in the log for too long. In this case, we want to be able to roll the log periodically based on time such that it will freeze the largest timestamp in the rolled segment and cause it to be deleted when the time limit has been reached. The second one is for log cleaner to happen quicker since the cleaner never cleans the active segment. In both cases, we really just want to be able to roll the log at some predicable time interval. There are different implementations can achieve this. The issue with the current implementation is that if data with oscillating timestamp are published at the same time, it causes the log to roll to quickly, which will surprise people. We can ask people to turn off log rolling in most cases. However, the default log rolling is 7 days and people could hit this issue before realizing it. In some of the rare cases, people may indeed want to configure time-based log rolling and may still send data with oscillating timestamp. It would be good if the underlying system can support his without any performance impact. As for a better implementation, the original approach of just rolling based on create time addresses both use cases in the common cases, without the risk of rolling too frequently. The only thing is that create time will be reset when segments get moved. However, that happens rarely though. So, if there are no other better solutions that we could think of, this could be a safer implementation. > Change the time based log rolling to only based on the message timestamp. > ------------------------------------------------------------------------- > > Key: KAFKA-4099 > URL: https://issues.apache.org/jira/browse/KAFKA-4099 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > Fix For: 0.10.1.0 > > > This is an issue introduced in KAFKA-3163. When partition relocation occurs, > the newly created replica may have messages with old timestamp and cause the > log segment rolling for each message. The fix is to change the log rolling > behavior to only based on the message timestamp when the messages are in > message format 0.10.0 or above. If the first message in the segment does not > have a timetamp, we will fall back to use the wall clock time for log rolling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)