Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Jorge Quilcate Wed, 20 May 2020 15:37:45 -0700

Thank you both for the great feedback.

I like the "fancy" proposal :), and how it removes the need for
additional API methods. And with a feature flag on `StateStore`,
disabled by default, should no break current users.


The only side-effect I can think of is that: by moving the flag upwards,
all later operations become affected; which might be ok for most (all?)
cases. I can't think of an scenario where this would be an issue, just
want to point this out.

If moving to this approach, I'd like to check if I got this right before
updating the KIP:

- only `StateStore` will change by having a new method:
`backwardIteration()`, `false` by default to keep things compatible.
- then all `*Stores` will have to update their implementation based on
this flag.


On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>> There's no possibility that someone could be relying
>> on iterating over that range in increasing order, because that's not what
>> happens. However, they could indeed be relying on getting an empty
> iterator
>
> I just meant that they might be relying on the assumption that the range
> query
> will never return results with decreasing keys. The empty iterator wouldn't
> break that contract, but of course a surprise reverse iterator would.
>
> FWIW I actually am in favor of automatically converting to a reverse
> iterator,
> I just thought we should consider whether this should be off by default or
> even possible to disable at all.
>
> On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org> wrote:
>
>> Thanks for the response, Sophie,
>>
>> I wholeheartedly agree we should take as much into account as possible
>> up front, rather than regretting our decisions later. I actually do share
>> your vague sense of worry, which was what led me to say initially that I
>> thought my counterproposal might be "too fancy". Sometimes, it's better
>> to be explicit instead of "elegant", if we think more people will be
>> confused
>> than not.
>>
>> I really don't think that there's any danger of "relying on a bug" here,
>> although
>> people certainly could be relying on current behavior. One thing to be
>> clear
>> about (which I just left a more detailed comment in KAFKA-8159 about) is
>> that
>> when we say something like key1 > key2, this ordering is defined by the
>> serde's output and nothing else.
>>
>> Currently, thanks to your fix in https://github.com/apache/kafka/pull/6521
>> ,
>> the store contract is that for range scans, if from > to, then the store
>> must
>> return an empty iterator. There's no possibility that someone could be
>> relying
>> on iterating over that range in increasing order, because that's not what
>> happens. However, they could indeed be relying on getting an empty
>> iterator.
>>
>> My counterproposal was to actually change this contract to say that the
>> store
>> must return an iterator over the keys in that range, but in the reverse
>> order.
>> So, in addition to considering whether this idea is "too fancy" (aka
>> confusing),
>> we should also consider the likelihood of breaking an existing program with
>> this behavior/contract change.
>>
>> To echo your clarification, I'm also not advocating strongly in favor of my
>> proposal. I just wanted to present it for consideration alongside Jorge's
>> original one.
>>
>> Thanks for raising these very good points,
>> -John
>>
>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
>>>> Rather than working around it, I think we should just fix it
>>> Now *that's* a "fancy" idea :P
>>>
>>> That was my primary concern, although I do have a vague sense of worry
>>> that we might be allowing users to get into trouble without realizing it.
>>> For example if their custom serdes suffer a similar bug as the above,
>>> and/or
>>> they rely on getting results in increasing order (of the keys) even when
>>> to < from. Maybe they're relying on the fact that the range query returns
>>> nothing in that case.
>>>
>>> Not sure if that qualifies as relying on a bug or not, but in that past
>>> we've
>>> taken the stance that we should not break compatibility even if the user
>>> was relying on bugs or unintentional behavior.
>>>
>>> Just to clarify I'm not advocating strongly against this proposal, just
>>> laying
>>> out some considerations we should take into account. At the end of the
>> day
>>> we should do what's right rather than maintain compatibility with
>> existing
>>> bugs, but sometimes there's a reasonable middle ground.
>>>
>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <vvcep...@apache.org>
>> wrote:
>>>> Thanks Sophie,
>>>>
>>>> Woah, that’s a nasty bug. Rather than working around it, I think we
>> should
>>>> just fix it. I’ll leave some comments on the Jira.
>>>>
>>>> It doesn’t seem like it should be this KIP’s concern that some serdes
>>>> might be incorrectly written.
>>>>
>>>> Were there other practical concerns that you had in mind?
>>>>
>>>> Thanks,
>>>> John
>>>>
>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote:
>>>>> I like this "fancy idea" to just flip the to/from bytes but I think
>> there
>>>>> are some practical limitations to implementing this. In particular
>>>>> I'm thinking about this issue
>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the built-in
>>>> signed
>>>>> number serdes.
>>>>>
>>>>> This trick would actually fix the problem for negative-negative
>> queries
>>>>> (ie where to & from are negative) but would cause undetectable
>>>>> incorrect results for negative-positive queries. For example, say you
>>>>> call #range with from = -1 and to = 1, using the Short serdes. The
>>>>> serialized bytes for that are
>>>>>
>>>>> from = 1111111111111111
>>>>> to = 0000000000000001
>>>>>
>>>>> so we would end up flipping those and iterating over all keys from
>>>>> 0000000000000001 to 1111111111111111. Iterating in lexicographical
>>>>> order means we would iterate over every key in the space *except* for
>>>>> 0, but 0 is actually the *only* other key we meant to be included in
>> the
>>>>> range query.
>>>>>
>>>>> Currently we just log a warning and return an empty iterator when
>>>>> to < from, which is obviously also incorrect but feels slightly more
>>>>> palatable. If we start automatically converting to reverse queries we
>>>>> can't even log a warning in this case unless we wanted to log a
>> warning
>>>>> every time, which would be weird to do for a valid usage of a new
>>>>> feature.
>>>>>
>>>>> All that said, I still like the idea overall. Off the top of my head
>> I
>>>> guess
>>>>> we could add a store config to enable/disable automatic reverse
>>>> iteration,
>>>>> which is off by default?
>>>>>
>>>>> Thanks for the KIP! This will be a nice addition
>>>>>
>>>>> Sophie
>>>>>
>>>>>
>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <vvcep...@apache.org>
>>>> wrote:
>>>>>> Hi there Jorge,
>>>>>>
>>>>>> Thanks for the KIP!
>>>>>>
>>>>>> I think this feature sounds very reasonable.
>>>>>>
>>>>>> I'm not 100% sure if this is "too fancy", but what do you think
>>>>>> about avoiding the enum by instead allowing people to flip
>>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z" would
>>>>>> be a forward scan, and from "Z" to "A" would be a backward one?
>>>>>>
>>>>>> Thanks,
>>>>>> -John
>>>>>>
>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I would like to start the discussion for KIP-617:
>>>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>>>>>>> Looking forward to your feedback.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Jorge.
>>>>>>>
>>>>>>>
>>>>>>> Attachments:
>>>>>>> * 0x5F2C6E22064982DF.asc

0x5F2C6E22064982DF.asc
Description: application/pgp-keys

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Reply via email to