Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Matthias J. Sax Thu, 23 May 2019 20:46:43 -0700

Thanks John and Guozhang.

I also lean towards different metric names.


While updating the KIP, I also realized that the whole store API for
window and session store is actually rather inconsistent. Hence, I
extended the scope of this KIP to cleanup both interfaces and changed
the KIP name to

   "Cleanup built-in Store interfaces"

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198

I also included a small change to KeyValueStore to align all built-in
stores holistically.



Looking forward to your feedback.



-Matthias



On 5/7/19 1:19 AM, Guozhang Wang wrote:
> Thanks John.
> 
> I think I'm convinced about not collapsing the metrics measuring for
> various function calls. Regarding the possible solution I'm a bit inclined
> to just distinguish on the metric names directly over on tags: since our
> current metrics naming are a tad messed up anyways, fixing them in one shot
> as a breaking change sounds reasonable to me.
> 
> 
> Guozhang
> 
> 
> On Mon, May 6, 2019 at 9:14 AM John Roesler <j...@confluent.io> wrote:
> 
>> Thanks all (or range? ;) ) for the discussion. Good points all around.
>>
>> Although I see the allure of naming the metrics the same as the things
>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
>> thought, I think it's likely you'd want to measure the method calls
>> independently, since the different range variants would have wildly
>> different characteristics, which could then lead you to want to orient the
>> storage differently to support particular use cases.
>>
>> Pointing out some structural characteristics (I know you all know this
>> stuff, I'm just constructing a table for analysis):
>> * Java supports method-name overloading. *Different* methods can have the
>> same names, distinguished by arg lists; it doesn't change the fact that
>> they are different methods.
>> * Metrics does not support metric-name overloading, but metric names do
>> have some structure we could exploit, if you consider the tags.
>>
>> It seems to me that there's actually more domain mismatch if we just have
>> one metric named "range", since (e.g., in the SessionStore proposal above)
>> the Java API has *four* methods named "range".
>>
>> Two potential solutions I see:
>> * hierarchical metric names: "range-single-key-all-time",
>> "range-key-range-all-time", "range-single-key-time-range",
>> "range-key-range-time-range", maybe with better names... I'm not the best
>> at this stuff. Hopefully, you see the point, though... they all start with
>> "range", which provides the association to the method, and all have a
>> suffix which identifies the overload being measured.
>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
>> {"variant": "key-range-all-time"}, "range" {"variant":
>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
>> you could even split the tags up semantically, but my instinct says that
>> that would just make it harder to digest the metrics later on.
>>
>> Just some ideas.
>> -John
>>
>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> Thanks for the input Guozhang. I was not aware of those dependencies.
>>>
>>> It might be good to align this KIP with the metrics cleanup. Not sure
>>> atm, if we should use different metric names for different overloads,
>>> even if those have the same method name?
>>>
>>> If we rename all method to `range()` and use the same metric name for
>>> all, one could argue that this is still fine, because the metric
>>> collects how often a range query is executed (regardless of the range
>>> itself).
>>>
>>> On the other hand, this would basically be a "roll up". It could still
>>> be valuable to distinguish between single-key-time-range,
>>> key-range-time-range, and all-range queries. Users could still aggregate
>>> those later if they are not interested in the details, while it's not
>>> possible for user to split a pre-aggregated metric into it's component.
>>>
>>>
>>> Input from others might be helpful here, too.
>>>
>>>
>>> -Matthias
>>>
>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
>>>> realized there are a bunch of issues on metric names v.s. function
>> names,
>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
>>> etc.
>>>> So in that KIP I proposed to make the function name aligned with
>> metrics
>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
>>>> suggest we make this change as part of KIP-444 as well. Note that it
>>> means
>>>> different functions with the same name `range` will be measured under a
>>>> single metric then.
>>>>
>>>> But still for function named `all` it will be measured under a separate
>>>> metric named `all`, so I'm just clarifying with you if that's the
>>> intention.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matth...@confluent.io
>>>
>>>> wrote:
>>>>
>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
>> not
>>>>> take any parameters to limit a range and is a good name IMHO. But I am
>>>>> not married to keep `all()` and if we think we should rename it, too,
>> I
>>>>> am fine with it.
>>>>>
>>>>> Not sure what connection you make to metrics though. Can you
>> elaborate?
>>>>>
>>>>>
>>>>> Would be interested to hear others opinions on this, too.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
>>>>>> I like the renaming, since it also aligns with our metrics cleanup
>>>>>> (KIP-444) which touches upon the store level metrics as well.
>>>>>>
>>>>>> One question: you seems still suggesting to keep "all" with the
>> current
>>>>>> name (and also using a separate metric for it), what's the difference
>>>>>> between this one and other "range" functions?
>>>>>>
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
>> matth...@confluent.io
>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the input.
>>>>>>>
>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>> function
>>>>>>>>> and the old functions that we want to deprecate / remove right?
>> The
>>>>>>>
>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
>>> which
>>>>>>> we want to change the return type.
>>>>>>>
>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
>>> we
>>>>>>> could either only deprecate those fetch methods that return
>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
>>>>>>>
>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
>>>>>>> consider all methods that are called "fetch":
>>>>>>>
>>>>>>> ReadOnlyWindowStore:
>>>>>>>
>>>>>>>> V fetch(K, long)
>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>
>>>>>>> WindowStore:
>>>>>>>
>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>
>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
>> that
>>>>>>> fetch over all keys.
>>>>>>>
>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
>>> rename
>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
>> to
>>>>>>> have range()/rangeAll().
>>>>>>>
>>>>>>> If we do this, we might consider to rename methods for SessionStore,
>>>>>>> too. There is
>>>>>>>
>>>>>>>> ReadOnlySessionStore#fetch(K)
>>>>>>>> ReadOnlySessionStore#fetch(K, K)
>>>>>>>> SessionStore#findSessions(K, long, long)
>>>>>>>> SessionStore#findSessions(K, K, long, long)
>>>>>>>> SessionStore#fetchSession(K, long, long);
>>>>>>>
>>>>>>> Consistent renaming might be:
>>>>>>>
>>>>>>> ReadOnlySessionStore#range(K)
>>>>>>> ReadOnlySessionStore#range(K, K)
>>>>>>> SessionStore#range(K, long, long)
>>>>>>> SessionStore#range(K, K, long, long)
>>>>>>> SessionStore#get(K, long, long);
>>>>>>>
>>>>>>>
>>>>>>> Not sure if extensive renaming might have too big of an impact.
>>> However,
>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
>>>>>>> thus, it might also provide some additional value.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
>>>>>>>> Hello Matthias,
>>>>>>>>
>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>> function
>>>>>>>> and the old functions that we want to deprecate / remove right? The
>>>>>>>> existing ones have different signatures with parameters so that
>> they
>>>>>>> should
>>>>>>>> not have conflicts.
>>>>>>>>
>>>>>>>> I was thinking about just make the change directly without
>>> deprecating
>>>>>>>> existing ones which would require users of 2.3 to make code changes
>>> --
>>>>>>> this
>>>>>>>> code change looks reasonably straight-forward to me and not much
>>> worth
>>>>>>>> deferring to later when the deprecated ones are removed.
>>>>>>>>
>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
>> new
>>>>>>>> functions seems not very useful for users either since it is only
>>> used
>>>>> as
>>>>>>>> an indicator but users cannot make code changes during this phase
>>>>>>> anyways,
>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
>>>>> deprecated
>>>>>>>> ones and add the new one.
>>>>>>>>
>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
>>> with
>>>>>>> new
>>>>>>>> functions in one release, but if people have a good idea of the
>>>>> renaming
>>>>>>>> approach (I do not have a good one on top of my head :) I can also
>> be
>>>>>>>> convinced that way.
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
>>>>> matth...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am open to change the return type to
>>>>>>>>>
>>>>>>>>> KeyValueIterator<Windowed<K>, V>
>>>>>>>>>
>>>>>>>>> However, this requires to rename
>>>>>>>>>
>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
>>>>>>>>>
>>>>>>>>> to avoid naming conflicts.
>>>>>>>>>
>>>>>>>>> What new name would you suggest? The existing methods are called
>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
>>>>>>>>>
>>>>>>>>> While I think it would be good to get fully aligned return types,
>> I
>>> am
>>>>>>>>> not sure how we can get aligned method names (without renaming all
>>> of
>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
>>>>> up, I
>>>>>>>>> am no opposed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
>>>>>>>>>> I was thinking about changing the return type even, to
>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
>>>>> about
>>>>>>>>> the
>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
>> clearly
>>>>>>>>> either),
>>>>>>>>>> note it is not backward compatible at all.
>>>>>>>>>>
>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
>>> that
>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
>>>>> people
>>>>>>>>> felt
>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
>>>>>>>>> sop...@confluent.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I remember thinking this while working on window stores, am
>>>>> definitely
>>>>>>>>> for
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <j...@confluent.io
>>>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
>>>>>>>>>>>> -John
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
>>>>>>>>> matth...@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
>>>>>>>>>>>>> `WindowStoreIterator`.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Reply via email to