Re: [VOTE] KIP-155 Add range scan for windowed state stores

Guozhang Wang Fri, 12 May 2017 09:22:49 -0700

Thanks for the note Michael, I think it makes sense.

And thinking about it more, I also agree that even for single-key fetch on
ReadOnlyWindow/SessionStore it's better to return the Iterator<Windowed<K>,
V> even when the windowed key's key field would always be the same, as it
is more consistent with the added APIs. But we need to do it in a backward
compatible way, which we can discuss in a follow-up KIP.



Guozhang

On Thu, May 11, 2017 at 10:38 AM, Xavier Léauté <xav...@confluent.io> wrote:

> Thanks Michal, you are correct. I can see your point now, and I can get
> behind returning Windowed<K> as well for windowed stores.
>
> It might make sense to revisit the single key iterator in the future and do
> the same for consistency, but I'd rather not break backwards compatibility
> unless we have a good reason to do so.
>
> Everyone else who already voted on the KIP, are there any objections to
> this change? I have updated the KIP accordingly.
>
> Thank you,
> Xavier
>
> On Thu, May 11, 2017 at 12:44 AM Michal Borowiecki <
> michal.borowie...@openbet.com> wrote:
>
> > Also, wrt
> >
> > > In the case of the window store, the "key" of the single-key iterator
> is
> > > the actual timestamp of the underlying entry, not just range of the
> > > window,
> > > so if we were to wrap the result key a window we wouldn't be getting
> back
> > > the equivalent of the single key iterator.
> > I believe the timestamp in the entry *is* the window start time (the end
> > time is implicitly known by adding the window size to the window start
> > time)
> >
> >
> > https://github.com/apache/kafka/blob/trunk/streams/src/
> main/java/org/apache/kafka/streams/kstream/internals/
> KStreamWindowReduce.java#L109
> >
> >
> >
> > https://github.com/apache/kafka/blob/trunk/streams/src/
> main/java/org/apache/kafka/streams/kstream/internals/
> KStreamWindowAggregate.java#L111
> >
> > Both use window.start() as the timestamp when storing into the
> WindowStore.
> >
> > Or am I confusing something horribly here? Hope not ;-)
> >
> >
> > If the above is correct, then using KeyValueIterator<Windowed<K>, V> as
> > the return type of the new fetch method would indeed not lose anything
> > the single key iterator is offering.
> >
> > The window end time could simply be calculated as window start time +
> > window size (window size would have to be passed from the store supplier
> > to the store implementation, which I think it isn't now but that's an
> > implementation detail).
> >
> > If you take objection to exposing the window end time because the single
> > key iterator doesn't do that, then an alternative could also be to have
> > the return type of the new fetch be something like
> > KeyValueItarator<Tuple2<K, Long>, V>, since the key is composed of the
> > actual key and the timestamp together. peakNextKey() would then allow
> > you to glimpse both the actual key and the associated window start time.
> > This feels like a better workaround then putting the KeyValue pair in
> > the V of the WindowStoreIterator<V>.
> >
> > All-in-all, I'd still prefer KeyValueIterator<Windowed<K>, V> as it more
> > clearly names what's what.
> >
> > What do you think?
> >
> > Thanks,
> >
> > Michal
> >
> > On 11/05/17 07:51, Michal Borowiecki wrote:
> > > Well, another concern, apart from potential confusion, is that you
> > > won't be able to peek the actual next key, just the timestamp. So the
> > > tradeoff is between having consistency in return types versus
> > > consistency in having the ability to know the next key without moving
> > > the iterator. To me the latter just feels more important.
> > >
> > > Thanks,
> > > Michal
> > > On 11 May 2017 12:46 a.m., Xavier Léauté <xav...@confluent.io> wrote:
> > >
> > >     Thank you for the feedback Michal.
> > >
> > >     While I agree the return may be a little bit more confusing to
> reason
> > >     about, the reason for doing so was to keep the range query
> interfaces
> > >     consistent with their single-key counterparts.
> > >
> > >     In the case of the window store, the "key" of the single-key
> > >     iterator is
> > >     the actual timestamp of the underlying entry, not just range of
> > >     the window,
> > >     so if we were to wrap the result key a window we wouldn't be
> > >     getting back
> > >     the equivalent of the single key iterator.
> > >
> > >     In both cases peekNextKey is just returning the timestamp of the
> > >     next entry
> > >     in the window store that matches the query.
> > >
> > >     In the case of the session store, we already return Windowed<K>
> > >     for the
> > >     single-key method, so it made sense there to also return
> > >     Windowed<K> for
> > >     the range method.
> > >
> > >     Hope this make sense? Let me know if you still have concerns about
> > >     this.
> > >
> > >     Thank you,
> > >     Xavier
> > >
> > >     On Wed, May 10, 2017 at 12:25 PM Michal Borowiecki <
> > >     michal.borowie...@openbet.com> wrote:
> > >
> > >     > Apologies, I missed the discussion (or lack thereof) about the
> > >     return
> > >     > type of:
> > >     >
> > >     > WindowStoreIterator<KeyValue<K, V>> fetch(K from, K to, long
> > >     timeFrom,
> > >     > long timeTo)
> > >     >
> > >     >
> > >     > WindowStoreIterator<V> (as the KIP mentions) is a subclass of
> > >     > KeyValueIterator<Long, V>
> > >     >
> > >     > KeyValueIterator<K,V> has the following method:
> > >     >
> > >     > /** * Peek at the next key without advancing the iterator *
> > >     @return the
> > >     > key of the next value that would be returned from the next call
> > >     to next
> > >     > */ K peekNextKey();
> > >     >
> > >     > Given the type in this case will be Long, I assume what it would
> > >     return
> > >     > is the window timestamp of the next found record?
> > >     >
> > >     >
> > >     > In the case of WindowStoreIterator<V> fetch(K key, long
> > >     timeFrom, long
> > >     > timeTo);
> > >     > all records found by fetch have the same key, so it's harmless
> > >     to return
> > >     > the timestamp of the next found window but here we have varying
> > >     keys and
> > >     > varying windows, so won't it be too confusing?
> > >     >
> > >     > KeyValueIterator<Windowed<K>, V> (as in the proposed
> > >     > ReadOnlySessionStore.fetch) just feels much more intuitive.
> > >     >
> > >     > Apologies again for jumping onto this only once the voting has
> > >     already
> > >     > begun.
> > >     > Thanks,
> > >     > Michał
> > >     >
> > >     > On 10/05/17 20:08, Sriram Subramanian wrote:
> > >     > > +1
> > >     > >
> > >     > > On Wed, May 10, 2017 at 11:42 AM, Bill Bejeck
> > >     <bbej...@gmail.com> wrote:
> > >     > >
> > >     > >> +1
> > >     > >>
> > >     > >> Thanks,
> > >     > >> Bill
> > >     > >>
> > >     > >> On Wed, May 10, 2017 at 2:38 PM, Guozhang Wang
> > >     <wangg...@gmail.com>
> > >     > wrote:
> > >     > >>
> > >     > >>> +1. Thank you!
> > >     > >>>
> > >     > >>> On Wed, May 10, 2017 at 11:30 AM, Xavier Léauté
> > >     <xav...@confluent.io>
> > >     > >>> wrote:
> > >     > >>>
> > >     > >>>> Hi everyone,
> > >     > >>>>
> > >     > >>>> Since there aren't any objections to this addition, I would
> > >     like to
> > >     > >> start
> > >     > >>>> the voting on KIP-155 so we can hopefully get this into
> 0.11.
> > >     > >>>>
> > >     > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> > >     > >>>> 155+-+Add+range+scan+for+windowed+state+stores
> > >     > >>>>
> > >     > >>>> Voting will stay active for at least 72 hours.
> > >     > >>>>
> > >     > >>>> Thank you,
> > >     > >>>> Xavier
> > >     > >>>>
> > >     > >>>
> > >     > >>>
> > >     > >>> --
> > >     > >>> -- Guozhang
> > >     > >>>
> > >     >
> > >     >
> > >
> >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-155 Add range scan for windowed state stores

Reply via email to