Thank you both for the great feedback. I like the "fancy" proposal :), and how it removes the need for additional API methods. And with a feature flag on `StateStore`, disabled by default, should no break current users.
The only side-effect I can think of is that: by moving the flag upwards, all later operations become affected; which might be ok for most (all?) cases. I can't think of an scenario where this would be an issue, just want to point this out. If moving to this approach, I'd like to check if I got this right before updating the KIP: - only `StateStore` will change by having a new method: `backwardIteration()`, `false` by default to keep things compatible. - then all `*Stores` will have to update their implementation based on this flag. On 20/05/2020 21:02, Sophie Blee-Goldman wrote: >> There's no possibility that someone could be relying >> on iterating over that range in increasing order, because that's not what >> happens. However, they could indeed be relying on getting an empty > iterator > > I just meant that they might be relying on the assumption that the range > query > will never return results with decreasing keys. The empty iterator wouldn't > break that contract, but of course a surprise reverse iterator would. > > FWIW I actually am in favor of automatically converting to a reverse > iterator, > I just thought we should consider whether this should be off by default or > even possible to disable at all. > > On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org> wrote: > >> Thanks for the response, Sophie, >> >> I wholeheartedly agree we should take as much into account as possible >> up front, rather than regretting our decisions later. I actually do share >> your vague sense of worry, which was what led me to say initially that I >> thought my counterproposal might be "too fancy". Sometimes, it's better >> to be explicit instead of "elegant", if we think more people will be >> confused >> than not. >> >> I really don't think that there's any danger of "relying on a bug" here, >> although >> people certainly could be relying on current behavior. One thing to be >> clear >> about (which I just left a more detailed comment in KAFKA-8159 about) is >> that >> when we say something like key1 > key2, this ordering is defined by the >> serde's output and nothing else. >> >> Currently, thanks to your fix in https://github.com/apache/kafka/pull/6521 >> , >> the store contract is that for range scans, if from > to, then the store >> must >> return an empty iterator. There's no possibility that someone could be >> relying >> on iterating over that range in increasing order, because that's not what >> happens. However, they could indeed be relying on getting an empty >> iterator. >> >> My counterproposal was to actually change this contract to say that the >> store >> must return an iterator over the keys in that range, but in the reverse >> order. >> So, in addition to considering whether this idea is "too fancy" (aka >> confusing), >> we should also consider the likelihood of breaking an existing program with >> this behavior/contract change. >> >> To echo your clarification, I'm also not advocating strongly in favor of my >> proposal. I just wanted to present it for consideration alongside Jorge's >> original one. >> >> Thanks for raising these very good points, >> -John >> >> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote: >>>> Rather than working around it, I think we should just fix it >>> Now *that's* a "fancy" idea :P >>> >>> That was my primary concern, although I do have a vague sense of worry >>> that we might be allowing users to get into trouble without realizing it. >>> For example if their custom serdes suffer a similar bug as the above, >>> and/or >>> they rely on getting results in increasing order (of the keys) even when >>> to < from. Maybe they're relying on the fact that the range query returns >>> nothing in that case. >>> >>> Not sure if that qualifies as relying on a bug or not, but in that past >>> we've >>> taken the stance that we should not break compatibility even if the user >>> was relying on bugs or unintentional behavior. >>> >>> Just to clarify I'm not advocating strongly against this proposal, just >>> laying >>> out some considerations we should take into account. At the end of the >> day >>> we should do what's right rather than maintain compatibility with >> existing >>> bugs, but sometimes there's a reasonable middle ground. >>> >>> On Tue, May 19, 2020 at 6:15 PM John Roesler <vvcep...@apache.org> >> wrote: >>>> Thanks Sophie, >>>> >>>> Woah, that’s a nasty bug. Rather than working around it, I think we >> should >>>> just fix it. I’ll leave some comments on the Jira. >>>> >>>> It doesn’t seem like it should be this KIP’s concern that some serdes >>>> might be incorrectly written. >>>> >>>> Were there other practical concerns that you had in mind? >>>> >>>> Thanks, >>>> John >>>> >>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote: >>>>> I like this "fancy idea" to just flip the to/from bytes but I think >> there >>>>> are some practical limitations to implementing this. In particular >>>>> I'm thinking about this issue >>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the built-in >>>> signed >>>>> number serdes. >>>>> >>>>> This trick would actually fix the problem for negative-negative >> queries >>>>> (ie where to & from are negative) but would cause undetectable >>>>> incorrect results for negative-positive queries. For example, say you >>>>> call #range with from = -1 and to = 1, using the Short serdes. The >>>>> serialized bytes for that are >>>>> >>>>> from = 1111111111111111 >>>>> to = 0000000000000001 >>>>> >>>>> so we would end up flipping those and iterating over all keys from >>>>> 0000000000000001 to 1111111111111111. Iterating in lexicographical >>>>> order means we would iterate over every key in the space *except* for >>>>> 0, but 0 is actually the *only* other key we meant to be included in >> the >>>>> range query. >>>>> >>>>> Currently we just log a warning and return an empty iterator when >>>>> to < from, which is obviously also incorrect but feels slightly more >>>>> palatable. If we start automatically converting to reverse queries we >>>>> can't even log a warning in this case unless we wanted to log a >> warning >>>>> every time, which would be weird to do for a valid usage of a new >>>>> feature. >>>>> >>>>> All that said, I still like the idea overall. Off the top of my head >> I >>>> guess >>>>> we could add a store config to enable/disable automatic reverse >>>> iteration, >>>>> which is off by default? >>>>> >>>>> Thanks for the KIP! This will be a nice addition >>>>> >>>>> Sophie >>>>> >>>>> >>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <vvcep...@apache.org> >>>> wrote: >>>>>> Hi there Jorge, >>>>>> >>>>>> Thanks for the KIP! >>>>>> >>>>>> I think this feature sounds very reasonable. >>>>>> >>>>>> I'm not 100% sure if this is "too fancy", but what do you think >>>>>> about avoiding the enum by instead allowing people to flip >>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z" would >>>>>> be a forward scan, and from "Z" to "A" would be a backward one? >>>>>> >>>>>> Thanks, >>>>>> -John >>>>>> >>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote: >>>>>>> Hi everyone, >>>>>>> >>>>>>> I would like to start the discussion for KIP-617: >>>>>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards >>>>>>> Looking forward to your feedback. >>>>>>> >>>>>>> Thanks! >>>>>>> Jorge. >>>>>>> >>>>>>> >>>>>>> Attachments: >>>>>>> * 0x5F2C6E22064982DF.asc
0x5F2C6E22064982DF.asc
Description: application/pgp-keys