Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

John Roesler Thu, 21 May 2020 07:34:10 -0700

Hi Jorge,

Thanks for that idea. I agree, a feature flag would protect anyone
who may be depending on the current behavior.


It seems better to locate the feature flag in the initialization logic of
the store, rather than have a method on the "live" store that changes
its behavior on the fly.

It seems like there are two options here, one is to add a new config:

StreamsConfig.ENABLE_BACKWARDS_ITERATION =
  "enable.backwards.iteration

Or we can add a feature flag in Materialized, like

Materialized.enableBackwardsIteration()

I think I'd personally lean toward the config, for the following reason.
The concern that Sophie raised is that someone's program may depend
on the existing contract of getting an empty iterator. We don't want to
switch behavior when they aren't expecting it, so we provide them a
config to assert that they _are_ expecting the new behavior, which
means they take responsibility for updating their code to expect the new
behavior.

There doesn't seem to be a reason to offer a choice of behaviors on a
per-query, or per-store basis. We just want people to be not surprised
by this change in general.

What do you think?
Thanks,
-John

On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
> Thank you both for the great feedback.
> 
> I like the "fancy" proposal :), and how it removes the need for
> additional API methods. And with a feature flag on `StateStore`,
> disabled by default, should no break current users.
> 
> The only side-effect I can think of is that: by moving the flag upwards,
> all later operations become affected; which might be ok for most (all?)
> cases. I can't think of an scenario where this would be an issue, just
> want to point this out.
> 
> If moving to this approach, I'd like to check if I got this right before
> updating the KIP:
> 
> - only `StateStore` will change by having a new method:
> `backwardIteration()`, `false` by default to keep things compatible.
> - then all `*Stores` will have to update their implementation based on
> this flag.
> 
> 
> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> >> There's no possibility that someone could be relying
> >> on iterating over that range in increasing order, because that's not what
> >> happens. However, they could indeed be relying on getting an empty
> > iterator
> >
> > I just meant that they might be relying on the assumption that the range
> > query
> > will never return results with decreasing keys. The empty iterator wouldn't
> > break that contract, but of course a surprise reverse iterator would.
> >
> > FWIW I actually am in favor of automatically converting to a reverse
> > iterator,
> > I just thought we should consider whether this should be off by default or
> > even possible to disable at all.
> >
> > On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org> wrote:
> >
> >> Thanks for the response, Sophie,
> >>
> >> I wholeheartedly agree we should take as much into account as possible
> >> up front, rather than regretting our decisions later. I actually do share
> >> your vague sense of worry, which was what led me to say initially that I
> >> thought my counterproposal might be "too fancy". Sometimes, it's better
> >> to be explicit instead of "elegant", if we think more people will be
> >> confused
> >> than not.
> >>
> >> I really don't think that there's any danger of "relying on a bug" here,
> >> although
> >> people certainly could be relying on current behavior. One thing to be
> >> clear
> >> about (which I just left a more detailed comment in KAFKA-8159 about) is
> >> that
> >> when we say something like key1 > key2, this ordering is defined by the
> >> serde's output and nothing else.
> >>
> >> Currently, thanks to your fix in https://github.com/apache/kafka/pull/6521
> >> ,
> >> the store contract is that for range scans, if from > to, then the store
> >> must
> >> return an empty iterator. There's no possibility that someone could be
> >> relying
> >> on iterating over that range in increasing order, because that's not what
> >> happens. However, they could indeed be relying on getting an empty
> >> iterator.
> >>
> >> My counterproposal was to actually change this contract to say that the
> >> store
> >> must return an iterator over the keys in that range, but in the reverse
> >> order.
> >> So, in addition to considering whether this idea is "too fancy" (aka
> >> confusing),
> >> we should also consider the likelihood of breaking an existing program with
> >> this behavior/contract change.
> >>
> >> To echo your clarification, I'm also not advocating strongly in favor of my
> >> proposal. I just wanted to present it for consideration alongside Jorge's
> >> original one.
> >>
> >> Thanks for raising these very good points,
> >> -John
> >>
> >> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
> >>>> Rather than working around it, I think we should just fix it
> >>> Now *that's* a "fancy" idea :P
> >>>
> >>> That was my primary concern, although I do have a vague sense of worry
> >>> that we might be allowing users to get into trouble without realizing it.
> >>> For example if their custom serdes suffer a similar bug as the above,
> >>> and/or
> >>> they rely on getting results in increasing order (of the keys) even when
> >>> to < from. Maybe they're relying on the fact that the range query returns
> >>> nothing in that case.
> >>>
> >>> Not sure if that qualifies as relying on a bug or not, but in that past
> >>> we've
> >>> taken the stance that we should not break compatibility even if the user
> >>> was relying on bugs or unintentional behavior.
> >>>
> >>> Just to clarify I'm not advocating strongly against this proposal, just
> >>> laying
> >>> out some considerations we should take into account. At the end of the
> >> day
> >>> we should do what's right rather than maintain compatibility with
> >> existing
> >>> bugs, but sometimes there's a reasonable middle ground.
> >>>
> >>> On Tue, May 19, 2020 at 6:15 PM John Roesler <vvcep...@apache.org>
> >> wrote:
> >>>> Thanks Sophie,
> >>>>
> >>>> Woah, that’s a nasty bug. Rather than working around it, I think we
> >> should
> >>>> just fix it. I’ll leave some comments on the Jira.
> >>>>
> >>>> It doesn’t seem like it should be this KIP’s concern that some serdes
> >>>> might be incorrectly written.
> >>>>
> >>>> Were there other practical concerns that you had in mind?
> >>>>
> >>>> Thanks,
> >>>> John
> >>>>
> >>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote:
> >>>>> I like this "fancy idea" to just flip the to/from bytes but I think
> >> there
> >>>>> are some practical limitations to implementing this. In particular
> >>>>> I'm thinking about this issue
> >>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the built-in
> >>>> signed
> >>>>> number serdes.
> >>>>>
> >>>>> This trick would actually fix the problem for negative-negative
> >> queries
> >>>>> (ie where to & from are negative) but would cause undetectable
> >>>>> incorrect results for negative-positive queries. For example, say you
> >>>>> call #range with from = -1 and to = 1, using the Short serdes. The
> >>>>> serialized bytes for that are
> >>>>>
> >>>>> from = 1111111111111111
> >>>>> to = 0000000000000001
> >>>>>
> >>>>> so we would end up flipping those and iterating over all keys from
> >>>>> 0000000000000001 to 1111111111111111. Iterating in lexicographical
> >>>>> order means we would iterate over every key in the space *except* for
> >>>>> 0, but 0 is actually the *only* other key we meant to be included in
> >> the
> >>>>> range query.
> >>>>>
> >>>>> Currently we just log a warning and return an empty iterator when
> >>>>> to < from, which is obviously also incorrect but feels slightly more
> >>>>> palatable. If we start automatically converting to reverse queries we
> >>>>> can't even log a warning in this case unless we wanted to log a
> >> warning
> >>>>> every time, which would be weird to do for a valid usage of a new
> >>>>> feature.
> >>>>>
> >>>>> All that said, I still like the idea overall. Off the top of my head
> >> I
> >>>> guess
> >>>>> we could add a store config to enable/disable automatic reverse
> >>>> iteration,
> >>>>> which is off by default?
> >>>>>
> >>>>> Thanks for the KIP! This will be a nice addition
> >>>>>
> >>>>> Sophie
> >>>>>
> >>>>>
> >>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <vvcep...@apache.org>
> >>>> wrote:
> >>>>>> Hi there Jorge,
> >>>>>>
> >>>>>> Thanks for the KIP!
> >>>>>>
> >>>>>> I think this feature sounds very reasonable.
> >>>>>>
> >>>>>> I'm not 100% sure if this is "too fancy", but what do you think
> >>>>>> about avoiding the enum by instead allowing people to flip
> >>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z" would
> >>>>>> be a forward scan, and from "Z" to "A" would be a backward one?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -John
> >>>>>>
> >>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> I would like to start the discussion for KIP-617:
> >>>>>>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> >>>>>>> Looking forward to your feedback.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> Jorge.
> >>>>>>>
> >>>>>>>
> >>>>>>> Attachments:
> >>>>>>> * 0x5F2C6E22064982DF.asc
> 
> 
> Attachments:
> * 0x5F2C6E22064982DF.asc

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Reply via email to