Thanks for the KIP.
A somewhat orthogonal question I have is, if we should try to merge this
KIP with the existing KIP-842:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-842%3A+Add+richer+group+offset+reset+mechanisms
While KIP-842 has a different intention, ie, separating the case of "no
offset during initial startup" vs "offset out of bounds during
processing", it might still be worth it? My motivation for this question
is, that `auto.offset.reset` is one of the most central configs we have,
and thus, doing a single larger change (ie, both KIPs together) might be
better (less noise) than doing two independent changes? Thoughts? I know
it might be a little bit of a stretch, but asking cannot hurt :)
About the KIP itself, did we consider to add something like
"latest-local" to say reset to oldest local, but not fetch from tiered
storage?
For the format of the config, did we consider what we do in
`bin/kafka-consumer-group.sh`? It has multiple different options like
--by-duration <duration: format `PnDTnHnMnS`>
--shift-by <number-of-records>
--to-datetime <fixed point in time: format `YYYY-MM-DDTHH:mm:SS.sss`>
--to-offset <absolute offset>
As it also uses ISO format, it might be good to also use them (even if I
can see the appeal and simplicity of what you proposed).
I also want to add to (AM1), as there is also `Topology.AutoOffsetReset`
enum in Kafka Streams. I think we would need to convert this into a
class. While it could be done is a follow up KIP, too, it seems best to
do this holistically in a single KIP, because KS is not something on-top
of Kafka, but it's part of Kafka. I am happy to help with the design and
even PRs if necessary, but would strongly prefer to do it all in a
single KIP.
Btw: if we add something like "latest-local", it might also be good to
extend `bin/kafka-consumer-group.sh` accordingly (even if the tool works
slightly different, as it does commit an offset and there could be some
race condition between committing "latest-local", tiering, and when the
consumer is actually started?
-Matthias
On 11/4/24 7:38 AM, Apoorv Mittal wrote:
Hi Manikumar,
Thanks for the KIP, this new strategy would be helpful in specifying fetch
behaviour.
AM1: The config `auto.offset.reset` is currently applied as per the enum
class OffsetResetStarategy which is part of kafka-clients javadoc
<https://javadoc.io/doc/org.apache.kafka/kafka-clients/latest/index.html>.
Are we also proposing to somehow add new definitions in the same class?
However as new configurations will be a string representation hence are we
moving away from OffsetResetStartegy enum class altogether? Should we
include the change in the KIP as OffsetResetStarategy is part of public
javadoc?
AM2: While I can see the ISO-8601 format is in the rejected alternative,
should we not follow some standard of defining duration which has already
been adopted in other systems?
AM3: We've introduced new config values using the format minus-n-hours,
minus-n-days, minus-n-months, and minus-n-years. Should we explicitly
define the "minus" prefix, or is it implied?
AM4: When supporting duration-based resets, should we also allow users to
specify a specific checkpoint time? For example, if a checkpoint occurs 2
days, 5 hours and 30 minutes earlier, the current four formats
(minus-n-hours, minus-n-days, minus-n-months, minus-n-years) might not be
sufficient. Should we consider adding a format to accommodate specific
checkpoint times, or is there a reason to limit the supported formats?
Regards,
Apoorv Mittal
On Mon, Nov 4, 2024 at 9:23 AM Manikumar <manikumar.re...@gmail.com> wrote:
Hi everyone,
I would like to start a discussion on KIP-1106:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1106%3A+Add+duration+based+offset+reset+option+for+consumer+clients
This KIP proposes to add an additional auto offset reset strategy for
consumer clients.
Regards,
Manikumar