Thanks Bruno. I updated the KIP without changing the metric name yet -- want to discuss this first.
1) I am also ok to keep `all()` 2) That's a good idea. To make sure I understand your suggestion correctly, below the mapping between metric name an methods. (Some metric names are use by multiple stores): (ReadOnly)KeyValueStore: >> value-by-key : ReadOnlyKeyValueStore#get(K key) -> replaces `get` metric I am actually not sure if I like this. Just keeping `get` instead of `value-by-key`, `value-by-key-and-time` seems more reasonable to me. (ReadOnly)WindowStore: >> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long windowStartTime) I think a simple `get` might be better >> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant from, >> Instant to) >> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long >> toTime) >> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K to, >> Instant from, Instant to) >> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long >> fromTime, long toTime) >> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to) >> range-by-time-range : WindowStore#range(long fromTime, long toTime) >> range-all : ReadOnlyWindowStore#all() (ReadOnly)SessionStore: >> range-by-key : ReadOnlySessionStore#range(K key) >> range-by-key-range : ReadOnlySessionStore#range(K from, K to) >> range-by-key-and-time-range : SessionStore#range(K key, long >> earliestSessionEndTime, long latestSessionStartTime) >> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, >> long earliestSessionEndTime, long latestSessionStartTime) >> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, long >> sessionEndTime) I think a simple `get` might be better >> range-all : ReadOnlyKeyValueStore#all() 3) the `state-` part is already contained in `[storeType]` do I think it's correct as-is 4) Ack. Fixed. 5) I think `stream` (plural) is correct. Cf https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87 -Matthias On 5/28/19 3:26 AM, Bruno Cadonna wrote: > Hi all, > > My comments on this KIP: > > 1. I would use `all()` instead of `range()` because the functionality > is immediately clear without the need to look at the parameter list. > > 2. I would decouple method names from metrics name, because this > allows us to change one naming independently from the other in the > future. From the previous discussion on this thread, I have also the > feeling that the naming requirements differ between Java code and > metrics. > > My proposal for the metric names is the following: > > value-by-key > range-by-key > value-by-key-and-time > range-by-key-range > range-by-time-range > range-by-key-and-time-range > range-by-key-range-and-time-range > range-all > > I omitted the metric types like latency-avg, etc. The first word > denotes whether the query is a point query (`value`) or a range query > (`range`). The words after the `by` denote the parameter types of the > query. `range-all` is a special case. I hope, I did not miss any. > We could also prefix the above names with `get-` if you think, it > would make the names clearer. > > 3. Should `[storeType]-id` be `[storeType]-state-id`? > > 4. Some method signatures in the KIP under comment `// new` have still > the `@Deprecated` annotation. Copy&Paste error? > > 5. Should `type = stream-[storeType]-metrics` be `type = > streams-[storeType]-metrics` or do we need to keep `stream` for > historical/backward-compatibility reasons? > > Best, > Bruno > > On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <matth...@confluent.io> wrote: >> >> Thanks John and Guozhang. >> >> I also lean towards different metric names. >> >> While updating the KIP, I also realized that the whole store API for >> window and session store is actually rather inconsistent. Hence, I >> extended the scope of this KIP to cleanup both interfaces and changed >> the KIP name to >> >> "Cleanup built-in Store interfaces" >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198 >> >> I also included a small change to KeyValueStore to align all built-in >> stores holistically. >> >> >> >> Looking forward to your feedback. >> >> >> >> -Matthias >> >> >> >> On 5/7/19 1:19 AM, Guozhang Wang wrote: >>> Thanks John. >>> >>> I think I'm convinced about not collapsing the metrics measuring for >>> various function calls. Regarding the possible solution I'm a bit inclined >>> to just distinguish on the metric names directly over on tags: since our >>> current metrics naming are a tad messed up anyways, fixing them in one shot >>> as a breaking change sounds reasonable to me. >>> >>> >>> Guozhang >>> >>> >>> On Mon, May 6, 2019 at 9:14 AM John Roesler <j...@confluent.io> wrote: >>> >>>> Thanks all (or range? ;) ) for the discussion. Good points all around. >>>> >>>> Although I see the allure of naming the metrics the same as the things >>>> they're measuring, it seems not to be perfect. Seconding Matthias's latter >>>> thought, I think it's likely you'd want to measure the method calls >>>> independently, since the different range variants would have wildly >>>> different characteristics, which could then lead you to want to orient the >>>> storage differently to support particular use cases. >>>> >>>> Pointing out some structural characteristics (I know you all know this >>>> stuff, I'm just constructing a table for analysis): >>>> * Java supports method-name overloading. *Different* methods can have the >>>> same names, distinguished by arg lists; it doesn't change the fact that >>>> they are different methods. >>>> * Metrics does not support metric-name overloading, but metric names do >>>> have some structure we could exploit, if you consider the tags. >>>> >>>> It seems to me that there's actually more domain mismatch if we just have >>>> one metric named "range", since (e.g., in the SessionStore proposal above) >>>> the Java API has *four* methods named "range". >>>> >>>> Two potential solutions I see: >>>> * hierarchical metric names: "range-single-key-all-time", >>>> "range-key-range-all-time", "range-single-key-time-range", >>>> "range-key-range-time-range", maybe with better names... I'm not the best >>>> at this stuff. Hopefully, you see the point, though... they all start with >>>> "range", which provides the association to the method, and all have a >>>> suffix which identifies the overload being measured. >>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range" >>>> {"variant": "key-range-all-time"}, "range" {"variant": >>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or >>>> you could even split the tags up semantically, but my instinct says that >>>> that would just make it harder to digest the metrics later on. >>>> >>>> Just some ideas. >>>> -John >>>> >>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <matth...@confluent.io> >>>> wrote: >>>> >>>>> Thanks for the input Guozhang. I was not aware of those dependencies. >>>>> >>>>> It might be good to align this KIP with the metrics cleanup. Not sure >>>>> atm, if we should use different metric names for different overloads, >>>>> even if those have the same method name? >>>>> >>>>> If we rename all method to `range()` and use the same metric name for >>>>> all, one could argue that this is still fine, because the metric >>>>> collects how often a range query is executed (regardless of the range >>>>> itself). >>>>> >>>>> On the other hand, this would basically be a "roll up". It could still >>>>> be valuable to distinguish between single-key-time-range, >>>>> key-range-time-range, and all-range queries. Users could still aggregate >>>>> those later if they are not interested in the details, while it's not >>>>> possible for user to split a pre-aggregated metric into it's component. >>>>> >>>>> >>>>> Input from others might be helpful here, too. >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote: >>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I >>>>>> realized there are a bunch of issues on metric names v.s. function >>>> names, >>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`, >>>>> etc. >>>>>> So in that KIP I proposed to make the function name aligned with >>>> metrics >>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd >>>>>> suggest we make this change as part of KIP-444 as well. Note that it >>>>> means >>>>>> different functions with the same name `range` will be measured under a >>>>>> single metric then. >>>>>> >>>>>> But still for function named `all` it will be measured under a separate >>>>>> metric named `all`, so I'm just clarifying with you if that's the >>>>> intention. >>>>>> >>>>>> >>>>>> Guozhang >>>>>> >>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matth...@confluent.io >>>>> >>>>>> wrote: >>>>>> >>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does >>>> not >>>>>>> take any parameters to limit a range and is a good name IMHO. But I am >>>>>>> not married to keep `all()` and if we think we should rename it, too, >>>> I >>>>>>> am fine with it. >>>>>>> >>>>>>> Not sure what connection you make to metrics though. Can you >>>> elaborate? >>>>>>> >>>>>>> >>>>>>> Would be interested to hear others opinions on this, too. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote: >>>>>>>> I like the renaming, since it also aligns with our metrics cleanup >>>>>>>> (KIP-444) which touches upon the store level metrics as well. >>>>>>>> >>>>>>>> One question: you seems still suggesting to keep "all" with the >>>> current >>>>>>>> name (and also using a separate metric for it), what's the difference >>>>>>>> between this one and other "range" functions? >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax < >>>> matth...@confluent.io >>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks for the input. >>>>>>>>> >>>>>>>>>>> Just to clarify the naming conflicts is between the newly added >>>>>>> function >>>>>>>>>>> and the old functions that we want to deprecate / remove right? >>>> The >>>>>>>>> >>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for >>>>> which >>>>>>>>> we want to change the return type. >>>>>>>>> >>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus, >>>>> we >>>>>>>>> could either only deprecate those fetch methods that return >>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0. >>>>>>>>> >>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to >>>>>>>>> consider all methods that are called "fetch": >>>>>>>>> >>>>>>>>> ReadOnlyWindowStore: >>>>>>>>> >>>>>>>>>> V fetch(K, long) >>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant) >>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant) >>>>>>>>> >>>>>>>>> WindowStore: >>>>>>>>> >>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long) >>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)> >>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long) >>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant) >>>>>>>>> >>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant) >>>> that >>>>>>>>> fetch over all keys. >>>>>>>>> >>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and >>>>> rename >>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason >>>> to >>>>>>>>> have range()/rangeAll(). >>>>>>>>> >>>>>>>>> If we do this, we might consider to rename methods for SessionStore, >>>>>>>>> too. There is >>>>>>>>> >>>>>>>>>> ReadOnlySessionStore#fetch(K) >>>>>>>>>> ReadOnlySessionStore#fetch(K, K) >>>>>>>>>> SessionStore#findSessions(K, long, long) >>>>>>>>>> SessionStore#findSessions(K, K, long, long) >>>>>>>>>> SessionStore#fetchSession(K, long, long); >>>>>>>>> >>>>>>>>> Consistent renaming might be: >>>>>>>>> >>>>>>>>> ReadOnlySessionStore#range(K) >>>>>>>>> ReadOnlySessionStore#range(K, K) >>>>>>>>> SessionStore#range(K, long, long) >>>>>>>>> SessionStore#range(K, K, long, long) >>>>>>>>> SessionStore#get(K, long, long); >>>>>>>>> >>>>>>>>> >>>>>>>>> Not sure if extensive renaming might have too big of an impact. >>>>> However, >>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`, >>>>>>>>> thus, it might also provide some additional value. >>>>>>>>> >>>>>>>>> Thoughts? >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote: >>>>>>>>>> Hello Matthias, >>>>>>>>>> >>>>>>>>>> Just to clarify the naming conflicts is between the newly added >>>>>>> function >>>>>>>>>> and the old functions that we want to deprecate / remove right? The >>>>>>>>>> existing ones have different signatures with parameters so that >>>> they >>>>>>>>> should >>>>>>>>>> not have conflicts. >>>>>>>>>> >>>>>>>>>> I was thinking about just make the change directly without >>>>> deprecating >>>>>>>>>> existing ones which would require users of 2.3 to make code changes >>>>> -- >>>>>>>>> this >>>>>>>>>> code change looks reasonably straight-forward to me and not much >>>>> worth >>>>>>>>>> deferring to later when the deprecated ones are removed. >>>>>>>>>> >>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add >>>> new >>>>>>>>>> functions seems not very useful for users either since it is only >>>>> used >>>>>>> as >>>>>>>>>> an indicator but users cannot make code changes during this phase >>>>>>>>> anyways, >>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the >>>>>>> deprecated >>>>>>>>>> ones and add the new one. >>>>>>>>>> >>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it >>>>> with >>>>>>>>> new >>>>>>>>>> functions in one release, but if people have a good idea of the >>>>>>> renaming >>>>>>>>>> approach (I do not have a good one on top of my head :) I can also >>>> be >>>>>>>>>> convinced that way. >>>>>>>>>> >>>>>>>>>> Guozhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax < >>>>>>> matth...@confluent.io> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I am open to change the return type to >>>>>>>>>>> >>>>>>>>>>> KeyValueIterator<Windowed<K>, V> >>>>>>>>>>> >>>>>>>>>>> However, this requires to rename >>>>>>>>>>> >>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp) >>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp) >>>>>>>>>>> >>>>>>>>>>> to avoid naming conflicts. >>>>>>>>>>> >>>>>>>>>>> What new name would you suggest? The existing methods are called >>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`. >>>>>>>>>>> >>>>>>>>>>> While I think it would be good to get fully aligned return types, >>>> I >>>>> am >>>>>>>>>>> not sure how we can get aligned method names (without renaming all >>>>> of >>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned >>>>>>> up, I >>>>>>>>>>> am no opposed. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thoughts? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote: >>>>>>>>>>>> I was thinking about changing the return type even, to >>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users >>>>>>> about >>>>>>>>>>> the >>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it >>>> clearly >>>>>>>>>>> either), >>>>>>>>>>>> note it is not backward compatible at all. >>>>>>>>>>>> >>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones >>>>> that >>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most >>>>>>> people >>>>>>>>>>> felt >>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with >>>>>>>>>>>> `KeyValueIterator<Long, V>` as well. >>>>>>>>>>>> >>>>>>>>>>>> Guozhang >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman < >>>>>>>>>>> sop...@confluent.io> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I remember thinking this while working on window stores, am >>>>>>> definitely >>>>>>>>>>> for >>>>>>>>>>>>> it. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <j...@confluent.io >>>>> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias! >>>>>>>>>>>>>> -John >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax < >>>>>>>>>>> matth...@confluent.io> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface >>>>>>>>>>>>>>> `WindowStoreIterator`. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking forward to your feedback. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>
signature.asc
Description: OpenPGP digital signature