Hi all, My comments on this KIP:
1. I would use `all()` instead of `range()` because the functionality is immediately clear without the need to look at the parameter list. 2. I would decouple method names from metrics name, because this allows us to change one naming independently from the other in the future. From the previous discussion on this thread, I have also the feeling that the naming requirements differ between Java code and metrics. My proposal for the metric names is the following: value-by-key range-by-key value-by-key-and-time range-by-key-range range-by-time-range range-by-key-and-time-range range-by-key-range-and-time-range range-all I omitted the metric types like latency-avg, etc. The first word denotes whether the query is a point query (`value`) or a range query (`range`). The words after the `by` denote the parameter types of the query. `range-all` is a special case. I hope, I did not miss any. We could also prefix the above names with `get-` if you think, it would make the names clearer. 3. Should `[storeType]-id` be `[storeType]-state-id`? 4. Some method signatures in the KIP under comment `// new` have still the `@Deprecated` annotation. Copy&Paste error? 5. Should `type = stream-[storeType]-metrics` be `type = streams-[storeType]-metrics` or do we need to keep `stream` for historical/backward-compatibility reasons? Best, Bruno On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <matth...@confluent.io> wrote: > > Thanks John and Guozhang. > > I also lean towards different metric names. > > While updating the KIP, I also realized that the whole store API for > window and session store is actually rather inconsistent. Hence, I > extended the scope of this KIP to cleanup both interfaces and changed > the KIP name to > > "Cleanup built-in Store interfaces" > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198 > > I also included a small change to KeyValueStore to align all built-in > stores holistically. > > > > Looking forward to your feedback. > > > > -Matthias > > > > On 5/7/19 1:19 AM, Guozhang Wang wrote: > > Thanks John. > > > > I think I'm convinced about not collapsing the metrics measuring for > > various function calls. Regarding the possible solution I'm a bit inclined > > to just distinguish on the metric names directly over on tags: since our > > current metrics naming are a tad messed up anyways, fixing them in one shot > > as a breaking change sounds reasonable to me. > > > > > > Guozhang > > > > > > On Mon, May 6, 2019 at 9:14 AM John Roesler <j...@confluent.io> wrote: > > > >> Thanks all (or range? ;) ) for the discussion. Good points all around. > >> > >> Although I see the allure of naming the metrics the same as the things > >> they're measuring, it seems not to be perfect. Seconding Matthias's latter > >> thought, I think it's likely you'd want to measure the method calls > >> independently, since the different range variants would have wildly > >> different characteristics, which could then lead you to want to orient the > >> storage differently to support particular use cases. > >> > >> Pointing out some structural characteristics (I know you all know this > >> stuff, I'm just constructing a table for analysis): > >> * Java supports method-name overloading. *Different* methods can have the > >> same names, distinguished by arg lists; it doesn't change the fact that > >> they are different methods. > >> * Metrics does not support metric-name overloading, but metric names do > >> have some structure we could exploit, if you consider the tags. > >> > >> It seems to me that there's actually more domain mismatch if we just have > >> one metric named "range", since (e.g., in the SessionStore proposal above) > >> the Java API has *four* methods named "range". > >> > >> Two potential solutions I see: > >> * hierarchical metric names: "range-single-key-all-time", > >> "range-key-range-all-time", "range-single-key-time-range", > >> "range-key-range-time-range", maybe with better names... I'm not the best > >> at this stuff. Hopefully, you see the point, though... they all start with > >> "range", which provides the association to the method, and all have a > >> suffix which identifies the overload being measured. > >> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range" > >> {"variant": "key-range-all-time"}, "range" {"variant": > >> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or > >> you could even split the tags up semantically, but my instinct says that > >> that would just make it harder to digest the metrics later on. > >> > >> Just some ideas. > >> -John > >> > >> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <matth...@confluent.io> > >> wrote: > >> > >>> Thanks for the input Guozhang. I was not aware of those dependencies. > >>> > >>> It might be good to align this KIP with the metrics cleanup. Not sure > >>> atm, if we should use different metric names for different overloads, > >>> even if those have the same method name? > >>> > >>> If we rename all method to `range()` and use the same metric name for > >>> all, one could argue that this is still fine, because the metric > >>> collects how often a range query is executed (regardless of the range > >>> itself). > >>> > >>> On the other hand, this would basically be a "roll up". It could still > >>> be valuable to distinguish between single-key-time-range, > >>> key-range-time-range, and all-range queries. Users could still aggregate > >>> those later if they are not interested in the details, while it's not > >>> possible for user to split a pre-aggregated metric into it's component. > >>> > >>> > >>> Input from others might be helpful here, too. > >>> > >>> > >>> -Matthias > >>> > >>> On 4/11/19 6:00 PM, Guozhang Wang wrote: > >>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I > >>>> realized there are a bunch of issues on metric names v.s. function > >> names, > >>>> e.g. some function named `fetchAll` are actually measure with `fetch`, > >>> etc. > >>>> So in that KIP I proposed to make the function name aligned with > >> metrics > >>>> name. So suppose we rename the functions from `fetch` to `range` I'd > >>>> suggest we make this change as part of KIP-444 as well. Note that it > >>> means > >>>> different functions with the same name `range` will be measured under a > >>>> single metric then. > >>>> > >>>> But still for function named `all` it will be measured under a separate > >>>> metric named `all`, so I'm just clarifying with you if that's the > >>> intention. > >>>> > >>>> > >>>> Guozhang > >>>> > >>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matth...@confluent.io > >>> > >>>> wrote: > >>>> > >>>>> I did not see a reason to rename `all()` to `range()`. `all()` does > >> not > >>>>> take any parameters to limit a range and is a good name IMHO. But I am > >>>>> not married to keep `all()` and if we think we should rename it, too, > >> I > >>>>> am fine with it. > >>>>> > >>>>> Not sure what connection you make to metrics though. Can you > >> elaborate? > >>>>> > >>>>> > >>>>> Would be interested to hear others opinions on this, too. > >>>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote: > >>>>>> I like the renaming, since it also aligns with our metrics cleanup > >>>>>> (KIP-444) which touches upon the store level metrics as well. > >>>>>> > >>>>>> One question: you seems still suggesting to keep "all" with the > >> current > >>>>>> name (and also using a separate metric for it), what's the difference > >>>>>> between this one and other "range" functions? > >>>>>> > >>>>>> > >>>>>> Guozhang > >>>>>> > >>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax < > >> matth...@confluent.io > >>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Thanks for the input. > >>>>>>> > >>>>>>>>> Just to clarify the naming conflicts is between the newly added > >>>>> function > >>>>>>>>> and the old functions that we want to deprecate / remove right? > >> The > >>>>>>> > >>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for > >>> which > >>>>>>> we want to change the return type. > >>>>>>> > >>>>>>> IMHO, we should not make a breaking change in a minor release. Thus, > >>> we > >>>>>>> could either only deprecate those fetch methods that return > >>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0. > >>>>>>> > >>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to > >>>>>>> consider all methods that are called "fetch": > >>>>>>> > >>>>>>> ReadOnlyWindowStore: > >>>>>>> > >>>>>>>> V fetch(K, long) > >>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant) > >>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant) > >>>>>>> > >>>>>>> WindowStore: > >>>>>>> > >>>>>>>> WindowStoreIterator<V> fetch(K, long, long) > >>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)> > >>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long) > >>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant) > >>>>>>> > >>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant) > >> that > >>>>>>> fetch over all keys. > >>>>>>> > >>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and > >>> rename > >>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason > >> to > >>>>>>> have range()/rangeAll(). > >>>>>>> > >>>>>>> If we do this, we might consider to rename methods for SessionStore, > >>>>>>> too. There is > >>>>>>> > >>>>>>>> ReadOnlySessionStore#fetch(K) > >>>>>>>> ReadOnlySessionStore#fetch(K, K) > >>>>>>>> SessionStore#findSessions(K, long, long) > >>>>>>>> SessionStore#findSessions(K, K, long, long) > >>>>>>>> SessionStore#fetchSession(K, long, long); > >>>>>>> > >>>>>>> Consistent renaming might be: > >>>>>>> > >>>>>>> ReadOnlySessionStore#range(K) > >>>>>>> ReadOnlySessionStore#range(K, K) > >>>>>>> SessionStore#range(K, long, long) > >>>>>>> SessionStore#range(K, K, long, long) > >>>>>>> SessionStore#get(K, long, long); > >>>>>>> > >>>>>>> > >>>>>>> Not sure if extensive renaming might have too big of an impact. > >>> However, > >>>>>>> `range()` and `get()` might actually be better names than `fetch()`, > >>>>>>> thus, it might also provide some additional value. > >>>>>>> > >>>>>>> Thoughts? > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> > >>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote: > >>>>>>>> Hello Matthias, > >>>>>>>> > >>>>>>>> Just to clarify the naming conflicts is between the newly added > >>>>> function > >>>>>>>> and the old functions that we want to deprecate / remove right? The > >>>>>>>> existing ones have different signatures with parameters so that > >> they > >>>>>>> should > >>>>>>>> not have conflicts. > >>>>>>>> > >>>>>>>> I was thinking about just make the change directly without > >>> deprecating > >>>>>>>> existing ones which would require users of 2.3 to make code changes > >>> -- > >>>>>>> this > >>>>>>>> code change looks reasonably straight-forward to me and not much > >>> worth > >>>>>>>> deferring to later when the deprecated ones are removed. > >>>>>>>> > >>>>>>>> On the other hand, just deprecating "WindowIterator" without add > >> new > >>>>>>>> functions seems not very useful for users either since it is only > >>> used > >>>>> as > >>>>>>>> an indicator but users cannot make code changes during this phase > >>>>>>> anyways, > >>>>>>>> so it is still a `one-cut` deal when we eventually remove the > >>>>> deprecated > >>>>>>>> ones and add the new one. > >>>>>>>> > >>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it > >>> with > >>>>>>> new > >>>>>>>> functions in one release, but if people have a good idea of the > >>>>> renaming > >>>>>>>> approach (I do not have a good one on top of my head :) I can also > >> be > >>>>>>>> convinced that way. > >>>>>>>> > >>>>>>>> Guozhang > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax < > >>>>> matth...@confluent.io> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> I am open to change the return type to > >>>>>>>>> > >>>>>>>>> KeyValueIterator<Windowed<K>, V> > >>>>>>>>> > >>>>>>>>> However, this requires to rename > >>>>>>>>> > >>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp) > >>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp) > >>>>>>>>> > >>>>>>>>> to avoid naming conflicts. > >>>>>>>>> > >>>>>>>>> What new name would you suggest? The existing methods are called > >>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`. > >>>>>>>>> > >>>>>>>>> While I think it would be good to get fully aligned return types, > >> I > >>> am > >>>>>>>>> not sure how we can get aligned method names (without renaming all > >>> of > >>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned > >>>>> up, I > >>>>>>>>> am no opposed. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thoughts? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -Matthias > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote: > >>>>>>>>>> I was thinking about changing the return type even, to > >>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users > >>>>> about > >>>>>>>>> the > >>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it > >> clearly > >>>>>>>>> either), > >>>>>>>>>> note it is not backward compatible at all. > >>>>>>>>>> > >>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones > >>> that > >>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most > >>>>> people > >>>>>>>>> felt > >>>>>>>>>> it is too intrusive for compatibility I can be convinced with > >>>>>>>>>> `KeyValueIterator<Long, V>` as well. > >>>>>>>>>> > >>>>>>>>>> Guozhang > >>>>>>>>>> > >>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman < > >>>>>>>>> sop...@confluent.io> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> I remember thinking this while working on window stores, am > >>>>> definitely > >>>>>>>>> for > >>>>>>>>>>> it. > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <j...@confluent.io > >>> > >>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Sounds great to me. Thanks, Matthias! > >>>>>>>>>>>> -John > >>>>>>>>>>>> > >>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax < > >>>>>>>>> matth...@confluent.io> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface > >>>>>>>>>>>>> `WindowStoreIterator`. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator > >>>>>>>>>>>>> > >>>>>>>>>>>>> Looking forward to your feedback. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > >