[ 
https://issues.apache.org/jira/browse/KAFKA-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906349#comment-15906349
 ] 

Eric Bolinger commented on KAFKA-3806:
--------------------------------------

People learn of the interaction between log.retention.hours and 
offsets.retention.minutes through experience rather than documentation.  Then 
they search the internet for clues until they come upon this ticket.  _If the 
Kafka cluster sets both properties to the same duration_ (accounting for hours 
vs minutes), then you have a stable solution.  Any client that always starts 
with `auto.offset.reset=earliest` will behave consistently.

When the offsets retention period is less than log retention, you have a window 
where the client behavior is unstable.  The absolute values are subjective and 
vary for different use cases. 

The scenario where multiple people are running intermittent console consumers 
points to a need for the client to override the server offsets retention 
period.  If the console consumer specifies a new consumer group every time, 
then it should also specify a short offset.retention.minutes.

The solution where clients periodically commit offsets is just poor design.  
The offsets.retention period is managed by the cluster while the clients 
(plural) must individually manage their own update intervals.  What happens 
when the values change?  What are the trade-offs?  Why is the client doing work 
to manage memory in the cluster nodes?  Count the leaking abstractions here...


> Adjust default values of log.retention.hours and offsets.retention.minutes
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-3806
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3806
>             Project: Kafka
>          Issue Type: Improvement
>          Components: config
>    Affects Versions: 0.9.0.1, 0.10.0.0
>            Reporter: Michal Turek
>            Priority: Minor
>
> Combination of default values of log.retention.hours (168 hours = 7 days) and 
> offsets.retention.minutes (1440 minutes = 1 day) may be dangerous in special 
> cases. Offset retention should be always greater than log retention.
> We have observed the following scenario and issue:
> - Producing of data to a topic was disabled two days ago by producer update, 
> topic wasn't deleted.
> - Consumer consumed all data and properly committed offsets to Kafka.
> - Consumer made no more offset commits for that topic because there was no 
> more incoming data and there was nothing to confirm. (We have auto-commit 
> disabled, I'm not sure how behaves enabled auto-commit.)
> - After one day: Kafka cleared too old offsets according to 
> offsets.retention.minutes.
> - After two days: Long-term running consumer was restarted after update, it 
> didn't find any committed offsets for that topic since they were deleted by 
> offsets.retention.minutes so it started consuming from the beginning.
> - The messages were still in Kafka due to larger log.retention.hours, about 5 
> days of messages were read again.
> Known workaround to solve this issue:
> - Explicitly configure log.retention.hours and offsets.retention.minutes, 
> don't use defaults.
> Proposals:
> - Prolong default value of offsets.retention.minutes to be at least twice 
> larger than log.retention.hours.
> - Check these values during Kafka startup and log a warning if 
> offsets.retention.minutes is smaller than log.retention.hours.
> - Add a note to migration guide about differences between storing of offsets 
> in ZooKeeper and Kafka (http://kafka.apache.org/documentation.html#upgrade).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to