Hi Matthias, Thanks for the review.
MS1: looks like KIP-842 is stuck in the voting. Current changes should fit into KIP-842 (few updates definitely required) as KIP-842 is proposing to add new configs (auto.offset.reset.on.no.initial.offset and auto.offset.reset.on.invalid.offset). I prefer to revive KIP-842 after completing this KIP. MS2: Added support for earliest-local config to reset to oldest local offset and changes to kafka-consumer-group cmd. MS3: Based on suggestions, I have updated the config option to support ISO format. MS4: I have included the AutoOffsetReset changes to the KIP. I definitely need your help on refining the API/solution for KS :) Thanks On Tue, Nov 5, 2024 at 7:26 AM Matthias J. Sax <mj...@apache.org> wrote: > > Thanks for the KIP. > > A somewhat orthogonal question I have is, if we should try to merge this > KIP with the existing KIP-842: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-842%3A+Add+richer+group+offset+reset+mechanisms > > While KIP-842 has a different intention, ie, separating the case of "no > offset during initial startup" vs "offset out of bounds during > processing", it might still be worth it? My motivation for this question > is, that `auto.offset.reset` is one of the most central configs we have, > and thus, doing a single larger change (ie, both KIPs together) might be > better (less noise) than doing two independent changes? Thoughts? I know > it might be a little bit of a stretch, but asking cannot hurt :) > > > About the KIP itself, did we consider to add something like > "latest-local" to say reset to oldest local, but not fetch from tiered > storage? > > For the format of the config, did we consider what we do in > `bin/kafka-consumer-group.sh`? It has multiple different options like > > --by-duration <duration: format `PnDTnHnMnS`> > --shift-by <number-of-records> > > --to-datetime <fixed point in time: format `YYYY-MM-DDTHH:mm:SS.sss`> > --to-offset <absolute offset> > > As it also uses ISO format, it might be good to also use them (even if I > can see the appeal and simplicity of what you proposed). > > > I also want to add to (AM1), as there is also `Topology.AutoOffsetReset` > enum in Kafka Streams. I think we would need to convert this into a > class. While it could be done is a follow up KIP, too, it seems best to > do this holistically in a single KIP, because KS is not something on-top > of Kafka, but it's part of Kafka. I am happy to help with the design and > even PRs if necessary, but would strongly prefer to do it all in a > single KIP. > > > Btw: if we add something like "latest-local", it might also be good to > extend `bin/kafka-consumer-group.sh` accordingly (even if the tool works > slightly different, as it does commit an offset and there could be some > race condition between committing "latest-local", tiering, and when the > consumer is actually started? > > > -Matthias > > > On 11/4/24 7:38 AM, Apoorv Mittal wrote: > > Hi Manikumar, > > Thanks for the KIP, this new strategy would be helpful in specifying fetch > > behaviour. > > > > AM1: The config `auto.offset.reset` is currently applied as per the enum > > class OffsetResetStarategy which is part of kafka-clients javadoc > > <https://javadoc.io/doc/org.apache.kafka/kafka-clients/latest/index.html>. > > Are we also proposing to somehow add new definitions in the same class? > > However as new configurations will be a string representation hence are we > > moving away from OffsetResetStartegy enum class altogether? Should we > > include the change in the KIP as OffsetResetStarategy is part of public > > javadoc? > > > > AM2: While I can see the ISO-8601 format is in the rejected alternative, > > should we not follow some standard of defining duration which has already > > been adopted in other systems? > > > > AM3: We've introduced new config values using the format minus-n-hours, > > minus-n-days, minus-n-months, and minus-n-years. Should we explicitly > > define the "minus" prefix, or is it implied? > > > > AM4: When supporting duration-based resets, should we also allow users to > > specify a specific checkpoint time? For example, if a checkpoint occurs 2 > > days, 5 hours and 30 minutes earlier, the current four formats > > (minus-n-hours, minus-n-days, minus-n-months, minus-n-years) might not be > > sufficient. Should we consider adding a format to accommodate specific > > checkpoint times, or is there a reason to limit the supported formats? > > > > > > Regards, > > Apoorv Mittal > > > > > > On Mon, Nov 4, 2024 at 9:23 AM Manikumar <manikumar.re...@gmail.com> wrote: > > > >> Hi everyone, > >> I would like to start a discussion on KIP-1106: > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1106%3A+Add+duration+based+offset+reset+option+for+consumer+clients > >> > >> This KIP proposes to add an additional auto offset reset strategy for > >> consumer clients. > >> > >> Regards, > >> Manikumar > >> > >