[ 
https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095940#comment-14095940
 ] 

Joel Koshy commented on KAFKA-1510:
-----------------------------------

After some discussion with [~guozhang] and [~junrao] here are some additional 
comments to help clarify my earlier reasoning:

In order to migrate offsets from ZooKeeper to Kafka, at minimum we need to 
force an unfiltered commit (regardless of whether offsets have changed or not) 
at some point - e.g., shut down of the consumer.

An orthogonal issue is that of a consumer that consumes a low-volume topic. 
i.e., if the offsets don't change within the offset retention threshold on the 
offset manager (defaults to one day) then those offsets will be deleted. If the 
consumer fails for any reason and does an offset fetch, it will reset to 
earliest or latest. We have a couple of options:
* One possible approach to address this is to configure the broker-side offset 
retention period to a large value - i.e., larger than the maximum retention 
period of any topic. This is not ideal because: (a) if there are short-lived 
(say, console-) consumers that come and go often then those offsets can sit 
around for a long time; (b) in general, you cannot really come up with a 
retention period for a compacted topics. So I would not want to do this, but I 
wrote this here for completeness.
* Another approach is to do UN-filtered commits if offsets.storage is set to 
Kafka. i.e., commit everything always.
* Yet another approach is to do unfiltered commits at a configurable interval.

Thoughts?

My preference after thinking about it is to go with the second approach.


> Force offset commits when migrating consumer offsets from zookeeper to kafka
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-1510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1510
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>              Labels: newbie
>             Fix For: 0.8.2
>
>         Attachments: kafka-1510.patch
>
>
> When migrating consumer offsets from ZooKeeper to kafka, we have to turn on 
> dual-commit (i.e., the consumers will commit offsets to both zookeeper and 
> kafka) in addition to setting offsets.storage to kafka. However, when we 
> commit offsets we only commit offsets if they have changed (since the last 
> commit). For low-volume topics or for topics that receive data in bursts 
> offsets may not move for a long period of time. Therefore we may want to 
> force the commit (even if offsets have not changed) when migrating (i.e., 
> when dual-commit is enabled) - we can add a minimum interval threshold (say 
> force commit after every 10 auto-commits) as well as on rebalance and 
> shutdown.
> Also, I think it is safe to switch the default for offsets.storage from 
> zookeeper to kafka and set the default to dual-commit (for people who have 
> not migrated yet). We have deployed this to the largest consumers at linkedin 
> and have not seen any issues so far (except for the migration caveat that 
> this jira will resolve).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to