Let's see what other think about (2)
(3) Interesting. My reasoning follows the code: For example, `RocksDbKeyValueBytesStoreSupplier#metricsScope()`returns "rocksdb-state" and this is concatenated later in `MeteredKeyValueStore` that adds a tag key: metricScope + "-id" value: name() // this is the store name Hence, the `-state` part comes from the supplier and thus it seems to be part of `[store-type]` (ie, `store-type` == metricScope()` -- not sure if this interpretation holds). If it's not part of `[store-type]` the supplier should not return it as `metricScope()` IMHO, but the `MeteredKeyValueStore` should add it together with `-id` suffix from my understanding. Thoughts? (5) Renaming to `streams` might imply that we need to rename _all_ metrics, not just the store metrics that are affected, to avoid inconsistencies. If we really want to rename it, I would rather suggest to do it as part of KIP-444, that is a larger rework anyway. It seems to be out of scope for this KIP. -Matthias On 5/28/19 12:30 PM, Bruno Cadonna wrote: > Hi Matthias, > > 2) > Yes, this is how I meant it. > I am still in favour of `value-by-...` instead of `get`, because it > gives you immediately the meaning of the metric without the need to > know the API of the stores. > > 3) > I asked to avoid misunderstandings because in KIP-444, `-state-` is > stated explicitly in the tag. > > 5) > The question was not about correctness. I know that we use `stream` in > the metrics group. The question was rather if we want to change it. > The streams metrics interface is called `StreamsMetrics`. I understand > that this has low priority. Just wanted to mention it. > > Best, > Bruno > > > On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax <matth...@confluent.io> wrote: >> >> Thanks Bruno. >> >> I updated the KIP without changing the metric name yet -- want to >> discuss this first. >> >> >> >> 1) I am also ok to keep `all()` >> >> >> 2) That's a good idea. To make sure I understand your suggestion >> correctly, below the mapping between metric name an methods. >> (Some metric names are use by multiple stores): >> >> (ReadOnly)KeyValueStore: >> >>>> value-by-key : ReadOnlyKeyValueStore#get(K key) >> >> -> replaces `get` metric >> >> I am actually not sure if I like this. Just keeping `get` instead of >> `value-by-key`, `value-by-key-and-time` seems more reasonable to me. >> >> >> >> (ReadOnly)WindowStore: >> >>>> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long >>>> windowStartTime) >> >> I think a simple `get` might be better >> >>>> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant >>>> from, Instant to) >>>> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long >>>> toTime) >> >>>> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K >>>> to, Instant from, Instant to) >>>> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long >>>> fromTime, long toTime) >> >>>> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to) >>>> range-by-time-range : WindowStore#range(long fromTime, long toTime) >> >>>> range-all : ReadOnlyWindowStore#all() >> >> >> (ReadOnly)SessionStore: >> >>>> range-by-key : ReadOnlySessionStore#range(K key) >> >>>> range-by-key-range : ReadOnlySessionStore#range(K from, K to) >>>> range-by-key-and-time-range : SessionStore#range(K key, long >>>> earliestSessionEndTime, long latestSessionStartTime) >> >>>> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, >>>> long earliestSessionEndTime, long latestSessionStartTime) >> >>>> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, >>>> long sessionEndTime) >> >> I think a simple `get` might be better >> >>>> range-all : ReadOnlyKeyValueStore#all() >> >> >> >> >> 3) the `state-` part is already contained in `[storeType]` do I think >> it's correct as-is >> >> >> 4) Ack. Fixed. >> >> >> 5) I think `stream` (plural) is correct. Cf >> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87 >> >> >> >> -Matthias >> >> >> On 5/28/19 3:26 AM, Bruno Cadonna wrote: >>> Hi all, >>> >>> My comments on this KIP: >>> >>> 1. I would use `all()` instead of `range()` because the functionality >>> is immediately clear without the need to look at the parameter list. >>> >>> 2. I would decouple method names from metrics name, because this >>> allows us to change one naming independently from the other in the >>> future. From the previous discussion on this thread, I have also the >>> feeling that the naming requirements differ between Java code and >>> metrics. >>> >>> My proposal for the metric names is the following: >>> >>> value-by-key >>> range-by-key >>> value-by-key-and-time >>> range-by-key-range >>> range-by-time-range >>> range-by-key-and-time-range >>> range-by-key-range-and-time-range >>> range-all >>> >>> I omitted the metric types like latency-avg, etc. The first word >>> denotes whether the query is a point query (`value`) or a range query >>> (`range`). The words after the `by` denote the parameter types of the >>> query. `range-all` is a special case. I hope, I did not miss any. >>> We could also prefix the above names with `get-` if you think, it >>> would make the names clearer. >>> >>> 3. Should `[storeType]-id` be `[storeType]-state-id`? >>> >>> 4. Some method signatures in the KIP under comment `// new` have still >>> the `@Deprecated` annotation. Copy&Paste error? >>> >>> 5. Should `type = stream-[storeType]-metrics` be `type = >>> streams-[storeType]-metrics` or do we need to keep `stream` for >>> historical/backward-compatibility reasons? >>> >>> Best, >>> Bruno >>> >>> On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <matth...@confluent.io> >>> wrote: >>>> >>>> Thanks John and Guozhang. >>>> >>>> I also lean towards different metric names. >>>> >>>> While updating the KIP, I also realized that the whole store API for >>>> window and session store is actually rather inconsistent. Hence, I >>>> extended the scope of this KIP to cleanup both interfaces and changed >>>> the KIP name to >>>> >>>> "Cleanup built-in Store interfaces" >>>> >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198 >>>> >>>> I also included a small change to KeyValueStore to align all built-in >>>> stores holistically. >>>> >>>> >>>> >>>> Looking forward to your feedback. >>>> >>>> >>>> >>>> -Matthias >>>> >>>> >>>> >>>> On 5/7/19 1:19 AM, Guozhang Wang wrote: >>>>> Thanks John. >>>>> >>>>> I think I'm convinced about not collapsing the metrics measuring for >>>>> various function calls. Regarding the possible solution I'm a bit inclined >>>>> to just distinguish on the metric names directly over on tags: since our >>>>> current metrics naming are a tad messed up anyways, fixing them in one >>>>> shot >>>>> as a breaking change sounds reasonable to me. >>>>> >>>>> >>>>> Guozhang >>>>> >>>>> >>>>> On Mon, May 6, 2019 at 9:14 AM John Roesler <j...@confluent.io> wrote: >>>>> >>>>>> Thanks all (or range? ;) ) for the discussion. Good points all around. >>>>>> >>>>>> Although I see the allure of naming the metrics the same as the things >>>>>> they're measuring, it seems not to be perfect. Seconding Matthias's >>>>>> latter >>>>>> thought, I think it's likely you'd want to measure the method calls >>>>>> independently, since the different range variants would have wildly >>>>>> different characteristics, which could then lead you to want to orient >>>>>> the >>>>>> storage differently to support particular use cases. >>>>>> >>>>>> Pointing out some structural characteristics (I know you all know this >>>>>> stuff, I'm just constructing a table for analysis): >>>>>> * Java supports method-name overloading. *Different* methods can have the >>>>>> same names, distinguished by arg lists; it doesn't change the fact that >>>>>> they are different methods. >>>>>> * Metrics does not support metric-name overloading, but metric names do >>>>>> have some structure we could exploit, if you consider the tags. >>>>>> >>>>>> It seems to me that there's actually more domain mismatch if we just have >>>>>> one metric named "range", since (e.g., in the SessionStore proposal >>>>>> above) >>>>>> the Java API has *four* methods named "range". >>>>>> >>>>>> Two potential solutions I see: >>>>>> * hierarchical metric names: "range-single-key-all-time", >>>>>> "range-key-range-all-time", "range-single-key-time-range", >>>>>> "range-key-range-time-range", maybe with better names... I'm not the best >>>>>> at this stuff. Hopefully, you see the point, though... they all start >>>>>> with >>>>>> "range", which provides the association to the method, and all have a >>>>>> suffix which identifies the overload being measured. >>>>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, >>>>>> "range" >>>>>> {"variant": "key-range-all-time"}, "range" {"variant": >>>>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . >>>>>> Or >>>>>> you could even split the tags up semantically, but my instinct says that >>>>>> that would just make it harder to digest the metrics later on. >>>>>> >>>>>> Just some ideas. >>>>>> -John >>>>>> >>>>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <matth...@confluent.io> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the input Guozhang. I was not aware of those dependencies. >>>>>>> >>>>>>> It might be good to align this KIP with the metrics cleanup. Not sure >>>>>>> atm, if we should use different metric names for different overloads, >>>>>>> even if those have the same method name? >>>>>>> >>>>>>> If we rename all method to `range()` and use the same metric name for >>>>>>> all, one could argue that this is still fine, because the metric >>>>>>> collects how often a range query is executed (regardless of the range >>>>>>> itself). >>>>>>> >>>>>>> On the other hand, this would basically be a "roll up". It could still >>>>>>> be valuable to distinguish between single-key-time-range, >>>>>>> key-range-time-range, and all-range queries. Users could still aggregate >>>>>>> those later if they are not interested in the details, while it's not >>>>>>> possible for user to split a pre-aggregated metric into it's component. >>>>>>> >>>>>>> >>>>>>> Input from others might be helpful here, too. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote: >>>>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I >>>>>>>> realized there are a bunch of issues on metric names v.s. function >>>>>> names, >>>>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`, >>>>>>> etc. >>>>>>>> So in that KIP I proposed to make the function name aligned with >>>>>> metrics >>>>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd >>>>>>>> suggest we make this change as part of KIP-444 as well. Note that it >>>>>>> means >>>>>>>> different functions with the same name `range` will be measured under a >>>>>>>> single metric then. >>>>>>>> >>>>>>>> But still for function named `all` it will be measured under a separate >>>>>>>> metric named `all`, so I'm just clarifying with you if that's the >>>>>>> intention. >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matth...@confluent.io >>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does >>>>>> not >>>>>>>>> take any parameters to limit a range and is a good name IMHO. But I am >>>>>>>>> not married to keep `all()` and if we think we should rename it, too, >>>>>> I >>>>>>>>> am fine with it. >>>>>>>>> >>>>>>>>> Not sure what connection you make to metrics though. Can you >>>>>> elaborate? >>>>>>>>> >>>>>>>>> >>>>>>>>> Would be interested to hear others opinions on this, too. >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote: >>>>>>>>>> I like the renaming, since it also aligns with our metrics cleanup >>>>>>>>>> (KIP-444) which touches upon the store level metrics as well. >>>>>>>>>> >>>>>>>>>> One question: you seems still suggesting to keep "all" with the >>>>>> current >>>>>>>>>> name (and also using a separate metric for it), what's the difference >>>>>>>>>> between this one and other "range" functions? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Guozhang >>>>>>>>>> >>>>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax < >>>>>> matth...@confluent.io >>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for the input. >>>>>>>>>>> >>>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added >>>>>>>>> function >>>>>>>>>>>>> and the old functions that we want to deprecate / remove right? >>>>>> The >>>>>>>>>>> >>>>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for >>>>>>> which >>>>>>>>>>> we want to change the return type. >>>>>>>>>>> >>>>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus, >>>>>>> we >>>>>>>>>>> could either only deprecate those fetch methods that return >>>>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0. >>>>>>>>>>> >>>>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to >>>>>>>>>>> consider all methods that are called "fetch": >>>>>>>>>>> >>>>>>>>>>> ReadOnlyWindowStore: >>>>>>>>>>> >>>>>>>>>>>> V fetch(K, long) >>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant) >>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant) >>>>>>>>>>> >>>>>>>>>>> WindowStore: >>>>>>>>>>> >>>>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long) >>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)> >>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long) >>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant) >>>>>>>>>>> >>>>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant) >>>>>> that >>>>>>>>>>> fetch over all keys. >>>>>>>>>>> >>>>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and >>>>>>> rename >>>>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason >>>>>> to >>>>>>>>>>> have range()/rangeAll(). >>>>>>>>>>> >>>>>>>>>>> If we do this, we might consider to rename methods for SessionStore, >>>>>>>>>>> too. There is >>>>>>>>>>> >>>>>>>>>>>> ReadOnlySessionStore#fetch(K) >>>>>>>>>>>> ReadOnlySessionStore#fetch(K, K) >>>>>>>>>>>> SessionStore#findSessions(K, long, long) >>>>>>>>>>>> SessionStore#findSessions(K, K, long, long) >>>>>>>>>>>> SessionStore#fetchSession(K, long, long); >>>>>>>>>>> >>>>>>>>>>> Consistent renaming might be: >>>>>>>>>>> >>>>>>>>>>> ReadOnlySessionStore#range(K) >>>>>>>>>>> ReadOnlySessionStore#range(K, K) >>>>>>>>>>> SessionStore#range(K, long, long) >>>>>>>>>>> SessionStore#range(K, K, long, long) >>>>>>>>>>> SessionStore#get(K, long, long); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Not sure if extensive renaming might have too big of an impact. >>>>>>> However, >>>>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`, >>>>>>>>>>> thus, it might also provide some additional value. >>>>>>>>>>> >>>>>>>>>>> Thoughts? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote: >>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>> >>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added >>>>>>>>> function >>>>>>>>>>>> and the old functions that we want to deprecate / remove right? The >>>>>>>>>>>> existing ones have different signatures with parameters so that >>>>>> they >>>>>>>>>>> should >>>>>>>>>>>> not have conflicts. >>>>>>>>>>>> >>>>>>>>>>>> I was thinking about just make the change directly without >>>>>>> deprecating >>>>>>>>>>>> existing ones which would require users of 2.3 to make code changes >>>>>>> -- >>>>>>>>>>> this >>>>>>>>>>>> code change looks reasonably straight-forward to me and not much >>>>>>> worth >>>>>>>>>>>> deferring to later when the deprecated ones are removed. >>>>>>>>>>>> >>>>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add >>>>>> new >>>>>>>>>>>> functions seems not very useful for users either since it is only >>>>>>> used >>>>>>>>> as >>>>>>>>>>>> an indicator but users cannot make code changes during this phase >>>>>>>>>>> anyways, >>>>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the >>>>>>>>> deprecated >>>>>>>>>>>> ones and add the new one. >>>>>>>>>>>> >>>>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it >>>>>>> with >>>>>>>>>>> new >>>>>>>>>>>> functions in one release, but if people have a good idea of the >>>>>>>>> renaming >>>>>>>>>>>> approach (I do not have a good one on top of my head :) I can also >>>>>> be >>>>>>>>>>>> convinced that way. >>>>>>>>>>>> >>>>>>>>>>>> Guozhang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax < >>>>>>>>> matth...@confluent.io> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am open to change the return type to >>>>>>>>>>>>> >>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> >>>>>>>>>>>>> >>>>>>>>>>>>> However, this requires to rename >>>>>>>>>>>>> >>>>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp) >>>>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp) >>>>>>>>>>>>> >>>>>>>>>>>>> to avoid naming conflicts. >>>>>>>>>>>>> >>>>>>>>>>>>> What new name would you suggest? The existing methods are called >>>>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`. >>>>>>>>>>>>> >>>>>>>>>>>>> While I think it would be good to get fully aligned return types, >>>>>> I >>>>>>> am >>>>>>>>>>>>> not sure how we can get aligned method names (without renaming all >>>>>>> of >>>>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned >>>>>>>>> up, I >>>>>>>>>>>>> am no opposed. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote: >>>>>>>>>>>>>> I was thinking about changing the return type even, to >>>>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users >>>>>>>>> about >>>>>>>>>>>>> the >>>>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it >>>>>> clearly >>>>>>>>>>>>> either), >>>>>>>>>>>>>> note it is not backward compatible at all. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones >>>>>>> that >>>>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most >>>>>>>>> people >>>>>>>>>>>>> felt >>>>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with >>>>>>>>>>>>>> `KeyValueIterator<Long, V>` as well. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman < >>>>>>>>>>>>> sop...@confluent.io> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I remember thinking this while working on window stores, am >>>>>>>>> definitely >>>>>>>>>>>>> for >>>>>>>>>>>>>>> it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <j...@confluent.io >>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias! >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax < >>>>>>>>>>>>> matth...@confluent.io> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface >>>>>>>>>>>>>>>>> `WindowStoreIterator`. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looking forward to your feedback. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>
signature.asc
Description: OpenPGP digital signature