Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Bruno Cadonna Tue, 28 May 2019 12:32:01 -0700

Hi Matthias,

2)
Yes, this is how I meant it.
I am still in favour of `value-by-...` instead of `get`, because it
gives you immediately the meaning of the metric without the need to
know the API of the stores.


3)
I asked to avoid misunderstandings because in KIP-444, `-state-` is
stated explicitly in the tag.

5)
The question was not about correctness. I know that we use `stream` in
the metrics group. The question was rather if we want to change it.
The streams metrics interface is called `StreamsMetrics`. I understand
that this has low priority. Just wanted to mention it.

Best,
Bruno


On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax <matth...@confluent.io> wrote:
>
> Thanks Bruno.
>
> I updated the KIP without changing the metric name yet -- want to
> discuss this first.
>
>
>
> 1) I am also ok to keep `all()`
>
>
> 2) That's a good idea. To make sure I understand your suggestion
> correctly, below the mapping between metric name an methods.
> (Some metric names are use by multiple stores):
>
> (ReadOnly)KeyValueStore:
>
> >> value-by-key : ReadOnlyKeyValueStore#get(K key)
>
> -> replaces `get` metric
>
> I am actually not sure if I like this. Just keeping `get` instead of
> `value-by-key`, `value-by-key-and-time` seems more reasonable to me.
>
>
>
> (ReadOnly)WindowStore:
>
> >> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long 
> >> windowStartTime)
>
> I think a simple `get` might be better
>
> >> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant 
> >> from, Instant to)
> >> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long 
> >> toTime)
>
> >> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K 
> >> to, Instant from, Instant to)
> >> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long 
> >> fromTime, long toTime)
>
> >> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
> >> range-by-time-range : WindowStore#range(long fromTime, long toTime)
>
> >> range-all : ReadOnlyWindowStore#all()
>
>
> (ReadOnly)SessionStore:
>
> >> range-by-key : ReadOnlySessionStore#range(K key)
>
> >> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
> >> range-by-key-and-time-range : SessionStore#range(K key, long 
> >> earliestSessionEndTime, long latestSessionStartTime)
>
> >> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, 
> >> long earliestSessionEndTime, long latestSessionStartTime)
>
> >> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, 
> >> long sessionEndTime)
>
> I think a simple `get` might be better
>
> >> range-all : ReadOnlyKeyValueStore#all()
>
>
>
>
> 3) the `state-` part is already contained in `[storeType]` do I think
> it's correct as-is
>
>
> 4) Ack. Fixed.
>
>
> 5) I think `stream` (plural) is correct. Cf
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87
>
>
>
> -Matthias
>
>
> On 5/28/19 3:26 AM, Bruno Cadonna wrote:
> > Hi all,
> >
> > My comments on this KIP:
> >
> > 1. I would use `all()` instead of `range()` because the functionality
> > is immediately clear without the need to look at the parameter list.
> >
> > 2. I would decouple method names from metrics name, because this
> > allows us to change one naming independently from the other in the
> > future. From the previous discussion on this thread, I have also the
> > feeling that the naming requirements differ between Java code and
> > metrics.
> >
> > My proposal for the metric names is the following:
> >
> > value-by-key
> > range-by-key
> > value-by-key-and-time
> > range-by-key-range
> > range-by-time-range
> > range-by-key-and-time-range
> > range-by-key-range-and-time-range
> > range-all
> >
> > I omitted the metric types like latency-avg, etc. The first word
> > denotes whether the query is a point query (`value`) or a range query
> > (`range`). The words after the `by` denote the parameter types of the
> > query. `range-all` is a special case. I hope, I did not miss any.
> > We could also prefix the above names with `get-` if you think, it
> > would make the names clearer.
> >
> > 3. Should `[storeType]-id` be `[storeType]-state-id`?
> >
> > 4. Some method signatures in the KIP under comment `// new` have still
> > the `@Deprecated` annotation. Copy&Paste error?
> >
> > 5. Should `type = stream-[storeType]-metrics` be `type =
> > streams-[storeType]-metrics` or do we need to keep `stream` for
> > historical/backward-compatibility reasons?
> >
> > Best,
> > Bruno
> >
> > On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <matth...@confluent.io> 
> > wrote:
> >>
> >> Thanks John and Guozhang.
> >>
> >> I also lean towards different metric names.
> >>
> >> While updating the KIP, I also realized that the whole store API for
> >> window and session store is actually rather inconsistent. Hence, I
> >> extended the scope of this KIP to cleanup both interfaces and changed
> >> the KIP name to
> >>
> >>    "Cleanup built-in Store interfaces"
> >>
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
> >>
> >> I also included a small change to KeyValueStore to align all built-in
> >> stores holistically.
> >>
> >>
> >>
> >> Looking forward to your feedback.
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 5/7/19 1:19 AM, Guozhang Wang wrote:
> >>> Thanks John.
> >>>
> >>> I think I'm convinced about not collapsing the metrics measuring for
> >>> various function calls. Regarding the possible solution I'm a bit inclined
> >>> to just distinguish on the metric names directly over on tags: since our
> >>> current metrics naming are a tad messed up anyways, fixing them in one 
> >>> shot
> >>> as a breaking change sounds reasonable to me.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Mon, May 6, 2019 at 9:14 AM John Roesler <j...@confluent.io> wrote:
> >>>
> >>>> Thanks all (or range? ;) ) for the discussion. Good points all around.
> >>>>
> >>>> Although I see the allure of naming the metrics the same as the things
> >>>> they're measuring, it seems not to be perfect. Seconding Matthias's 
> >>>> latter
> >>>> thought, I think it's likely you'd want to measure the method calls
> >>>> independently, since the different range variants would have wildly
> >>>> different characteristics, which could then lead you to want to orient 
> >>>> the
> >>>> storage differently to support particular use cases.
> >>>>
> >>>> Pointing out some structural characteristics (I know you all know this
> >>>> stuff, I'm just constructing a table for analysis):
> >>>> * Java supports method-name overloading. *Different* methods can have the
> >>>> same names, distinguished by arg lists; it doesn't change the fact that
> >>>> they are different methods.
> >>>> * Metrics does not support metric-name overloading, but metric names do
> >>>> have some structure we could exploit, if you consider the tags.
> >>>>
> >>>> It seems to me that there's actually more domain mismatch if we just have
> >>>> one metric named "range", since (e.g., in the SessionStore proposal 
> >>>> above)
> >>>> the Java API has *four* methods named "range".
> >>>>
> >>>> Two potential solutions I see:
> >>>> * hierarchical metric names: "range-single-key-all-time",
> >>>> "range-key-range-all-time", "range-single-key-time-range",
> >>>> "range-key-range-time-range", maybe with better names... I'm not the best
> >>>> at this stuff. Hopefully, you see the point, though... they all start 
> >>>> with
> >>>> "range", which provides the association to the method, and all have a
> >>>> suffix which identifies the overload being measured.
> >>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, 
> >>>> "range"
> >>>> {"variant": "key-range-all-time"}, "range" {"variant":
> >>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . 
> >>>> Or
> >>>> you could even split the tags up semantically, but my instinct says that
> >>>> that would just make it harder to digest the metrics later on.
> >>>>
> >>>> Just some ideas.
> >>>> -John
> >>>>
> >>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <matth...@confluent.io>
> >>>> wrote:
> >>>>
> >>>>> Thanks for the input Guozhang. I was not aware of those dependencies.
> >>>>>
> >>>>> It might be good to align this KIP with the metrics cleanup. Not sure
> >>>>> atm, if we should use different metric names for different overloads,
> >>>>> even if those have the same method name?
> >>>>>
> >>>>> If we rename all method to `range()` and use the same metric name for
> >>>>> all, one could argue that this is still fine, because the metric
> >>>>> collects how often a range query is executed (regardless of the range
> >>>>> itself).
> >>>>>
> >>>>> On the other hand, this would basically be a "roll up". It could still
> >>>>> be valuable to distinguish between single-key-time-range,
> >>>>> key-range-time-range, and all-range queries. Users could still aggregate
> >>>>> those later if they are not interested in the details, while it's not
> >>>>> possible for user to split a pre-aggregated metric into it's component.
> >>>>>
> >>>>>
> >>>>> Input from others might be helpful here, too.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
> >>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> >>>>>> realized there are a bunch of issues on metric names v.s. function
> >>>> names,
> >>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
> >>>>> etc.
> >>>>>> So in that KIP I proposed to make the function name aligned with
> >>>> metrics
> >>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
> >>>>>> suggest we make this change as part of KIP-444 as well. Note that it
> >>>>> means
> >>>>>> different functions with the same name `range` will be measured under a
> >>>>>> single metric then.
> >>>>>>
> >>>>>> But still for function named `all` it will be measured under a separate
> >>>>>> metric named `all`, so I'm just clarifying with you if that's the
> >>>>> intention.
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matth...@confluent.io
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
> >>>> not
> >>>>>>> take any parameters to limit a range and is a good name IMHO. But I am
> >>>>>>> not married to keep `all()` and if we think we should rename it, too,
> >>>> I
> >>>>>>> am fine with it.
> >>>>>>>
> >>>>>>> Not sure what connection you make to metrics though. Can you
> >>>> elaborate?
> >>>>>>>
> >>>>>>>
> >>>>>>> Would be interested to hear others opinions on this, too.
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> >>>>>>>> I like the renaming, since it also aligns with our metrics cleanup
> >>>>>>>> (KIP-444) which touches upon the store level metrics as well.
> >>>>>>>>
> >>>>>>>> One question: you seems still suggesting to keep "all" with the
> >>>> current
> >>>>>>>> name (and also using a separate metric for it), what's the difference
> >>>>>>>> between this one and other "range" functions?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
> >>>> matth...@confluent.io
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks for the input.
> >>>>>>>>>
> >>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>>>> function
> >>>>>>>>>>> and the old functions that we want to deprecate / remove right?
> >>>> The
> >>>>>>>>>
> >>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
> >>>>> which
> >>>>>>>>> we want to change the return type.
> >>>>>>>>>
> >>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
> >>>>> we
> >>>>>>>>> could either only deprecate those fetch methods that return
> >>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> >>>>>>>>>
> >>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
> >>>>>>>>> consider all methods that are called "fetch":
> >>>>>>>>>
> >>>>>>>>> ReadOnlyWindowStore:
> >>>>>>>>>
> >>>>>>>>>> V fetch(K, long)
> >>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> >>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>>>
> >>>>>>>>> WindowStore:
> >>>>>>>>>
> >>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
> >>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> >>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> >>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>>>
> >>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
> >>>> that
> >>>>>>>>> fetch over all keys.
> >>>>>>>>>
> >>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> >>>>> rename
> >>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
> >>>> to
> >>>>>>>>> have range()/rangeAll().
> >>>>>>>>>
> >>>>>>>>> If we do this, we might consider to rename methods for SessionStore,
> >>>>>>>>> too. There is
> >>>>>>>>>
> >>>>>>>>>> ReadOnlySessionStore#fetch(K)
> >>>>>>>>>> ReadOnlySessionStore#fetch(K, K)
> >>>>>>>>>> SessionStore#findSessions(K, long, long)
> >>>>>>>>>> SessionStore#findSessions(K, K, long, long)
> >>>>>>>>>> SessionStore#fetchSession(K, long, long);
> >>>>>>>>>
> >>>>>>>>> Consistent renaming might be:
> >>>>>>>>>
> >>>>>>>>> ReadOnlySessionStore#range(K)
> >>>>>>>>> ReadOnlySessionStore#range(K, K)
> >>>>>>>>> SessionStore#range(K, long, long)
> >>>>>>>>> SessionStore#range(K, K, long, long)
> >>>>>>>>> SessionStore#get(K, long, long);
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Not sure if extensive renaming might have too big of an impact.
> >>>>> However,
> >>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
> >>>>>>>>> thus, it might also provide some additional value.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> >>>>>>>>>> Hello Matthias,
> >>>>>>>>>>
> >>>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>>>> function
> >>>>>>>>>> and the old functions that we want to deprecate / remove right? The
> >>>>>>>>>> existing ones have different signatures with parameters so that
> >>>> they
> >>>>>>>>> should
> >>>>>>>>>> not have conflicts.
> >>>>>>>>>>
> >>>>>>>>>> I was thinking about just make the change directly without
> >>>>> deprecating
> >>>>>>>>>> existing ones which would require users of 2.3 to make code changes
> >>>>> --
> >>>>>>>>> this
> >>>>>>>>>> code change looks reasonably straight-forward to me and not much
> >>>>> worth
> >>>>>>>>>> deferring to later when the deprecated ones are removed.
> >>>>>>>>>>
> >>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
> >>>> new
> >>>>>>>>>> functions seems not very useful for users either since it is only
> >>>>> used
> >>>>>>> as
> >>>>>>>>>> an indicator but users cannot make code changes during this phase
> >>>>>>>>> anyways,
> >>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
> >>>>>>> deprecated
> >>>>>>>>>> ones and add the new one.
> >>>>>>>>>>
> >>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
> >>>>> with
> >>>>>>>>> new
> >>>>>>>>>> functions in one release, but if people have a good idea of the
> >>>>>>> renaming
> >>>>>>>>>> approach (I do not have a good one on top of my head :) I can also
> >>>> be
> >>>>>>>>>> convinced that way.
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> >>>>>>> matth...@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I am open to change the return type to
> >>>>>>>>>>>
> >>>>>>>>>>> KeyValueIterator<Windowed<K>, V>
> >>>>>>>>>>>
> >>>>>>>>>>> However, this requires to rename
> >>>>>>>>>>>
> >>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> >>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> >>>>>>>>>>>
> >>>>>>>>>>> to avoid naming conflicts.
> >>>>>>>>>>>
> >>>>>>>>>>> What new name would you suggest? The existing methods are called
> >>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> >>>>>>>>>>>
> >>>>>>>>>>> While I think it would be good to get fully aligned return types,
> >>>> I
> >>>>> am
> >>>>>>>>>>> not sure how we can get aligned method names (without renaming all
> >>>>> of
> >>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
> >>>>>>> up, I
> >>>>>>>>>>> am no opposed.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thoughts?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> >>>>>>>>>>>> I was thinking about changing the return type even, to
> >>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> >>>>>>> about
> >>>>>>>>>>> the
> >>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
> >>>> clearly
> >>>>>>>>>>> either),
> >>>>>>>>>>>> note it is not backward compatible at all.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> >>>>> that
> >>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> >>>>>>> people
> >>>>>>>>>>> felt
> >>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
> >>>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> >>>>>>>>>>> sop...@confluent.io>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I remember thinking this while working on window stores, am
> >>>>>>> definitely
> >>>>>>>>>>> for
> >>>>>>>>>>>>> it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <j...@confluent.io
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> >>>>>>>>>>> matth...@confluent.io>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> >>>>>>>>>>>>>>> `WindowStoreIterator`.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
>

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Reply via email to