[ https://issues.apache.org/jira/browse/KAFKA-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fabien LD updated KAFKA-6872: ----------------------------- Priority: Minor (was: Major) > Doc for log.roll.* is wrong > --------------------------- > > Key: KAFKA-6872 > URL: https://issues.apache.org/jira/browse/KAFKA-6872 > Project: Kafka > Issue Type: Bug > Components: documentation > Affects Versions: 1.0.0 > Reporter: Fabien LD > Priority: Minor > > For {{log.roll.ms}}, doc says for example: > {quote}The maximum time before a new log segment is rolled out (in > milliseconds). If not set, the value in log.roll.hours is used > {quote} > In other parts (see > [https://kafka.apache.org/10/documentation.html#upgrade_10_1_breaking]), it > says: > {quote}The log rolling time is no longer depending on log segment create > time. Instead it is now based on the timestamp in the messages. More > specifically. if the timestamp of the first message in the segment is T, the > log will be rolled out when a new message has a timestamp greater than or > equal to T + log.roll.ms > {quote} > which is wrong. More specifically, the wrong part is: > {quote}if the timestamp of the +first+ message in the segment is T > {quote} > Indeed, the truth is actually: > {quote}if the timestamp of the +last+ message in the segment is T > {quote} > > A simple use case to reproduce this is to configure a single broker with: > {code:java} > # One partition ... or any small number should be fine > num.partitions=1 > # 100MB segment > log.segment.bytes=1073741824 > # Delete old segments when their last addition is 24h old > log.retention.hours=24 > # Check age of segments every 5 minutes > log.retention.check.interval.ms=300000 > # Every hour (?!?!?), roll a new segment > log.roll.hours=1 > {code} > and loop on sending a small message (a few bytes so that you never reach > 100MB during the period of this test) every minute to one topic. > After at least 24h running, according to what is described in the doc, on > would expect to see ~24 segments (on new segment rolled every hour). > But the truth is that there is only one log segment with all the records you > sent. Stop the producer for a bit more than one hour and restart it: you will > have a second segment created per partition because at some point, when > adding a new record, the previous one (the last one of what was the current > segment) was more than 1h old. > This proves that the doc should say: > {quote}if the timestamp of the +last+ message in the segment is T, the log > will be rolled out when a new message has a timestamp greater than or equal > to T + log.roll.ms > {quote} > > Notes: > * as a DevOps, I would prefer the doc to be true and kafka's behavior to be > changed. But I think that both should be done: doc updated to let users of > current versions know what to expect (and avoid running into the problem we > faced) and later the behavior of kafka updated. Indeed, one could have kafka > keep very old records with default conf where {{log.roll.hours=168}} and > {{log.segment.bytes=1073741824}} and when pushing like one small (~1k) record > a day -> 100k records can fit in that segment -> it is never rotated > * I detected this on version 1.0.0 but assume it impacts much more than that > version (and very likely 1.1.0 too) -- This message was sent by Atlassian JIRA (v7.6.3#76005)