[
https://issues.apache.org/jira/browse/KAFKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240608#comment-16240608
]
Guozhang Wang commented on KAFKA-2758:
--------------------------------------
[~jjkoshy] That's a good point. The main motivation for 1) is for services like
MM, where a commit request may contains large number of partitions where many
of them contains the same offsets; and the hope is to reduce the request size
for such scenarios. I'm wondering if this is still a good trade-off with
complexity to modify the server-side logic handling commit offset to update the
timestamps from this group id (I think that is primarily dependent on how much
we can save in practice for network bandwidth).
> Improve Offset Commit Behavior
> ------------------------------
>
> Key: KAFKA-2758
> URL: https://issues.apache.org/jira/browse/KAFKA-2758
> Project: Kafka
> Issue Type: Improvement
> Components: consumer
> Reporter: Guozhang Wang
> Labels: newbiee, reliability
>
> There are two scenarios of offset committing that we can improve:
> 1) we can filter the partitions whose committed offset is equal to the
> consumed offset, meaning there is no new consumed messages from this
> partition and hence we do not need to include this partition in the commit
> request.
> 2) we can make a commit request right after resetting to a fetch / consume
> position either according to the reset policy (e.g. on consumer starting up,
> or handling of out of range offset, etc), or through the {code} seek {code}
> so that if the consumer fails right after these event, upon recovery it can
> restarts from the reset position instead of resetting again: this can lead
> to, for example, data loss if we use "largest" as reset policy while there
> are new messages coming to the fetching partitions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)