Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Bruno Cadonna Tue, 28 May 2019 03:27:21 -0700

Hi all,

My comments on this KIP:


1. I would use `all()` instead of `range()` because the functionality
is immediately clear without the need to look at the parameter list.

2. I would decouple method names from metrics name, because this
allows us to change one naming independently from the other in the
future. From the previous discussion on this thread, I have also the
feeling that the naming requirements differ between Java code and
metrics.

My proposal for the metric names is the following:

value-by-key
range-by-key
value-by-key-and-time
range-by-key-range
range-by-time-range
range-by-key-and-time-range
range-by-key-range-and-time-range
range-all

I omitted the metric types like latency-avg, etc. The first word
denotes whether the query is a point query (`value`) or a range query
(`range`). The words after the `by` denote the parameter types of the
query. `range-all` is a special case. I hope, I did not miss any.
We could also prefix the above names with `get-` if you think, it
would make the names clearer.

3. Should `[storeType]-id` be `[storeType]-state-id`?

4. Some method signatures in the KIP under comment `// new` have still
the `@Deprecated` annotation. Copy&Paste error?

5. Should `type = stream-[storeType]-metrics` be `type =
streams-[storeType]-metrics` or do we need to keep `stream` for
historical/backward-compatibility reasons?

Best,
Bruno

On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <[email protected]> wrote:
>
> Thanks John and Guozhang.
>
> I also lean towards different metric names.
>
> While updating the KIP, I also realized that the whole store API for
> window and session store is actually rather inconsistent. Hence, I
> extended the scope of this KIP to cleanup both interfaces and changed
> the KIP name to
>
>    "Cleanup built-in Store interfaces"
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
>
> I also included a small change to KeyValueStore to align all built-in
> stores holistically.
>
>
>
> Looking forward to your feedback.
>
>
>
> -Matthias
>
>
>
> On 5/7/19 1:19 AM, Guozhang Wang wrote:
> > Thanks John.
> >
> > I think I'm convinced about not collapsing the metrics measuring for
> > various function calls. Regarding the possible solution I'm a bit inclined
> > to just distinguish on the metric names directly over on tags: since our
> > current metrics naming are a tad messed up anyways, fixing them in one shot
> > as a breaking change sounds reasonable to me.
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 6, 2019 at 9:14 AM John Roesler <[email protected]> wrote:
> >
> >> Thanks all (or range? ;) ) for the discussion. Good points all around.
> >>
> >> Although I see the allure of naming the metrics the same as the things
> >> they're measuring, it seems not to be perfect. Seconding Matthias's latter
> >> thought, I think it's likely you'd want to measure the method calls
> >> independently, since the different range variants would have wildly
> >> different characteristics, which could then lead you to want to orient the
> >> storage differently to support particular use cases.
> >>
> >> Pointing out some structural characteristics (I know you all know this
> >> stuff, I'm just constructing a table for analysis):
> >> * Java supports method-name overloading. *Different* methods can have the
> >> same names, distinguished by arg lists; it doesn't change the fact that
> >> they are different methods.
> >> * Metrics does not support metric-name overloading, but metric names do
> >> have some structure we could exploit, if you consider the tags.
> >>
> >> It seems to me that there's actually more domain mismatch if we just have
> >> one metric named "range", since (e.g., in the SessionStore proposal above)
> >> the Java API has *four* methods named "range".
> >>
> >> Two potential solutions I see:
> >> * hierarchical metric names: "range-single-key-all-time",
> >> "range-key-range-all-time", "range-single-key-time-range",
> >> "range-key-range-time-range", maybe with better names... I'm not the best
> >> at this stuff. Hopefully, you see the point, though... they all start with
> >> "range", which provides the association to the method, and all have a
> >> suffix which identifies the overload being measured.
> >> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
> >> {"variant": "key-range-all-time"}, "range" {"variant":
> >> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
> >> you could even split the tags up semantically, but my instinct says that
> >> that would just make it harder to digest the metrics later on.
> >>
> >> Just some ideas.
> >> -John
> >>
> >> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <[email protected]>
> >> wrote:
> >>
> >>> Thanks for the input Guozhang. I was not aware of those dependencies.
> >>>
> >>> It might be good to align this KIP with the metrics cleanup. Not sure
> >>> atm, if we should use different metric names for different overloads,
> >>> even if those have the same method name?
> >>>
> >>> If we rename all method to `range()` and use the same metric name for
> >>> all, one could argue that this is still fine, because the metric
> >>> collects how often a range query is executed (regardless of the range
> >>> itself).
> >>>
> >>> On the other hand, this would basically be a "roll up". It could still
> >>> be valuable to distinguish between single-key-time-range,
> >>> key-range-time-range, and all-range queries. Users could still aggregate
> >>> those later if they are not interested in the details, while it's not
> >>> possible for user to split a pre-aggregated metric into it's component.
> >>>
> >>>
> >>> Input from others might be helpful here, too.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
> >>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> >>>> realized there are a bunch of issues on metric names v.s. function
> >> names,
> >>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
> >>> etc.
> >>>> So in that KIP I proposed to make the function name aligned with
> >> metrics
> >>>> name. So suppose we rename the functions from `fetch` to `range` I'd
> >>>> suggest we make this change as part of KIP-444 as well. Note that it
> >>> means
> >>>> different functions with the same name `range` will be measured under a
> >>>> single metric then.
> >>>>
> >>>> But still for function named `all` it will be measured under a separate
> >>>> metric named `all`, so I'm just clarifying with you if that's the
> >>> intention.
> >>>>
> >>>>
> >>>> Guozhang
> >>>>
> >>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <[email protected]
> >>>
> >>>> wrote:
> >>>>
> >>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
> >> not
> >>>>> take any parameters to limit a range and is a good name IMHO. But I am
> >>>>> not married to keep `all()` and if we think we should rename it, too,
> >> I
> >>>>> am fine with it.
> >>>>>
> >>>>> Not sure what connection you make to metrics though. Can you
> >> elaborate?
> >>>>>
> >>>>>
> >>>>> Would be interested to hear others opinions on this, too.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> >>>>>> I like the renaming, since it also aligns with our metrics cleanup
> >>>>>> (KIP-444) which touches upon the store level metrics as well.
> >>>>>>
> >>>>>> One question: you seems still suggesting to keep "all" with the
> >> current
> >>>>>> name (and also using a separate metric for it), what's the difference
> >>>>>> between this one and other "range" functions?
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
> >> [email protected]
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for the input.
> >>>>>>>
> >>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>> function
> >>>>>>>>> and the old functions that we want to deprecate / remove right?
> >> The
> >>>>>>>
> >>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
> >>> which
> >>>>>>> we want to change the return type.
> >>>>>>>
> >>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
> >>> we
> >>>>>>> could either only deprecate those fetch methods that return
> >>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> >>>>>>>
> >>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
> >>>>>>> consider all methods that are called "fetch":
> >>>>>>>
> >>>>>>> ReadOnlyWindowStore:
> >>>>>>>
> >>>>>>>> V fetch(K, long)
> >>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> >>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>
> >>>>>>> WindowStore:
> >>>>>>>
> >>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
> >>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> >>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> >>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>
> >>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
> >> that
> >>>>>>> fetch over all keys.
> >>>>>>>
> >>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> >>> rename
> >>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
> >> to
> >>>>>>> have range()/rangeAll().
> >>>>>>>
> >>>>>>> If we do this, we might consider to rename methods for SessionStore,
> >>>>>>> too. There is
> >>>>>>>
> >>>>>>>> ReadOnlySessionStore#fetch(K)
> >>>>>>>> ReadOnlySessionStore#fetch(K, K)
> >>>>>>>> SessionStore#findSessions(K, long, long)
> >>>>>>>> SessionStore#findSessions(K, K, long, long)
> >>>>>>>> SessionStore#fetchSession(K, long, long);
> >>>>>>>
> >>>>>>> Consistent renaming might be:
> >>>>>>>
> >>>>>>> ReadOnlySessionStore#range(K)
> >>>>>>> ReadOnlySessionStore#range(K, K)
> >>>>>>> SessionStore#range(K, long, long)
> >>>>>>> SessionStore#range(K, K, long, long)
> >>>>>>> SessionStore#get(K, long, long);
> >>>>>>>
> >>>>>>>
> >>>>>>> Not sure if extensive renaming might have too big of an impact.
> >>> However,
> >>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
> >>>>>>> thus, it might also provide some additional value.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> >>>>>>>> Hello Matthias,
> >>>>>>>>
> >>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>> function
> >>>>>>>> and the old functions that we want to deprecate / remove right? The
> >>>>>>>> existing ones have different signatures with parameters so that
> >> they
> >>>>>>> should
> >>>>>>>> not have conflicts.
> >>>>>>>>
> >>>>>>>> I was thinking about just make the change directly without
> >>> deprecating
> >>>>>>>> existing ones which would require users of 2.3 to make code changes
> >>> --
> >>>>>>> this
> >>>>>>>> code change looks reasonably straight-forward to me and not much
> >>> worth
> >>>>>>>> deferring to later when the deprecated ones are removed.
> >>>>>>>>
> >>>>>>>> On the other hand, just deprecating "WindowIterator" without add
> >> new
> >>>>>>>> functions seems not very useful for users either since it is only
> >>> used
> >>>>> as
> >>>>>>>> an indicator but users cannot make code changes during this phase
> >>>>>>> anyways,
> >>>>>>>> so it is still a `one-cut` deal when we eventually remove the
> >>>>> deprecated
> >>>>>>>> ones and add the new one.
> >>>>>>>>
> >>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
> >>> with
> >>>>>>> new
> >>>>>>>> functions in one release, but if people have a good idea of the
> >>>>> renaming
> >>>>>>>> approach (I do not have a good one on top of my head :) I can also
> >> be
> >>>>>>>> convinced that way.
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> >>>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I am open to change the return type to
> >>>>>>>>>
> >>>>>>>>> KeyValueIterator<Windowed<K>, V>
> >>>>>>>>>
> >>>>>>>>> However, this requires to rename
> >>>>>>>>>
> >>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> >>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> >>>>>>>>>
> >>>>>>>>> to avoid naming conflicts.
> >>>>>>>>>
> >>>>>>>>> What new name would you suggest? The existing methods are called
> >>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> >>>>>>>>>
> >>>>>>>>> While I think it would be good to get fully aligned return types,
> >> I
> >>> am
> >>>>>>>>> not sure how we can get aligned method names (without renaming all
> >>> of
> >>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
> >>>>> up, I
> >>>>>>>>> am no opposed.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> >>>>>>>>>> I was thinking about changing the return type even, to
> >>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> >>>>> about
> >>>>>>>>> the
> >>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
> >> clearly
> >>>>>>>>> either),
> >>>>>>>>>> note it is not backward compatible at all.
> >>>>>>>>>>
> >>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> >>> that
> >>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> >>>>> people
> >>>>>>>>> felt
> >>>>>>>>>> it is too intrusive for compatibility I can be convinced with
> >>>>>>>>>> `KeyValueIterator<Long, V>` as well.
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> >>>>>>>>> [email protected]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I remember thinking this while working on window stores, am
> >>>>> definitely
> >>>>>>>>> for
> >>>>>>>>>>> it.
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <[email protected]
> >>>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
> >>>>>>>>>>>> -John
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> >>>>>>>>> [email protected]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> >>>>>>>>>>>>> `WindowStoreIterator`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Reply via email to