Soliciting more feedback before vote.

On Wed, Oct 18, 2017 at 8:26 PM, Richard Yu <yohan.richard...@gmail.com>
wrote:

> Is this KIP close to completion? Because we could start working on the
> code itself now. (Its at about this stage).
>
> On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yohan.richard...@gmail.com>
> wrote:
>
>> As Guozhang Wang mentioned earlier, we want to mirror the structure of
>> similar Store class (namely KTable). The WindowedStore class might be
>> unique in itself as it uses fetch() methods, but in my opinion, uniformity
>> should be better suited for simplicity.
>>
>> On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xav...@confluent.io>
>> wrote:
>>
>>> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions
>>> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range"
>>> keyword for when we query a specific range of keys?
>>>
>>> Xavier
>>>
>>> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yohan.richard...@gmail.com>
>>> wrote:
>>>
>>> > Thanks for the clarifications, Xavier.
>>> > I have removed most of the methods except for keys() and all() which
>>> has
>>> > been renamed to Guozhang Wang's suggestions.
>>> >
>>> > Hope this helps.
>>> >
>>> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xav...@confluent.io>
>>> > wrote:
>>> >
>>> > > Thanks for the KIP Richard, this is a very useful addition!
>>> > >
>>> > > As far as the API changes, I just have a few comments on the methods
>>> that
>>> > > don't seem directly related to the KIP title, and naming of course
>>> :).
>>> > > On the implementation, see my notes further down that will hopefully
>>> > > clarify a few things.
>>> > >
>>> > > Regarding the "bonus" methods:
>>> > > I agree with Guozhang that the KIP lacks proper motivation for
>>> adding the
>>> > > min, max, and allLatest methods.
>>> > > It is also not clear to me what min and max would really mean, what
>>> > > ordering do we refer to here? Are we first ordering by time, then
>>> key, or
>>> > > first by key, then time?
>>> > > The allLatest method might be useful, but I don't really see how it
>>> would
>>> > > be used in practice if we have to scan the entire range of keys for
>>> all
>>> > the
>>> > > state stores, every single time.
>>> > >
>>> > > Maybe we could flesh the motivation behind those extra methods, but
>>> in
>>> > the
>>> > > interest of time, and moving the KIP forward it might make sense to
>>> file
>>> > a
>>> > > follow-up once we have more concrete use-cases.
>>> > >
>>> > > On naming:
>>> > > I also agree with Guozhang that "keys()" should be renamed. It feels
>>> a
>>> > bit
>>> > > of a misnomer, since it not only returns keys, but also the values.
>>> > >
>>> > > As far as what to rename it to, I would argue we already have some
>>> > > discrepancy between key-value stores using range() vs. window stores
>>> > using
>>> > > fetch().
>>> > > I assume we called the window method "fetch" instead of "get"
>>> because you
>>> > > might get back more than one window for the requested key.
>>> > >
>>> > > If we wanted to make things consistent with both existing key-value
>>> store
>>> > > naming and window store naming, we could do the following:
>>> > > Decide that "all" always refers to the entire range of keys,
>>> independent
>>> > of
>>> > > the window and similarly "range" always refers to a particular range
>>> of
>>> > > keys, irrespective of the window.
>>> > > We can then prefix methods with "fetch" to indicate that more than
>>> one
>>> > > window may be returned for each key in the range.
>>> > >
>>> > > This would give us:
>>> > > - a new fetchAll() method for all the keys, which makes it clear
>>> that you
>>> > > might get back the same key in different windows
>>> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
>>> given
>>> > > time range, again with possibly more than one window per key
>>> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
>>> long,
>>> > > long)  and deprecate the old one to indicate a range of keys
>>> > >
>>> > > One inconsistency I noted: the "Proposed Changes" section in your KIP
>>> > talks
>>> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer
>>> to
>>> > the
>>> > > all(from, to) method, but I'm sure you'll fix that once we decide on
>>> > > naming.
>>> > >
>>> > > On the implementation side:
>>> > > You mentioned that caching and rocksdb store have very different
>>> > key/value
>>> > > structures, and while it appears to be that way on the surface, the
>>> > > structure between the two is actually very similar. Keys in the
>>> cache are
>>> > > prefixed with a segment ID to ensure the ordering in the cache stays
>>> > > consistent with the rocksdb implementation, which maintains multiple
>>> > > rocksdb instances, one for each segment. So we just "artificially"
>>> mirror
>>> > > the segment structure in the cache.
>>> > >
>>> > > The reason for keeping the ordering consistent is pretty simple:
>>> keep in
>>> > > mind that when we query a cached window store we are effectively
>>> querying
>>> > > both the cache and the persistent rocksdb store at the same time,
>>> merging
>>> > > results from both. To make that merge as painless as possible, we
>>> ensure
>>> > > the ordering is consistent when querying a range of keys in both
>>> stores.
>>> > >
>>> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple
>>> > window
>>> > > stores within a topology.
>>> > >
>>> > > Hope this clarifies some of the less trivial parts of caching window
>>> > store.
>>> > >
>>> > > Cheers,
>>> > > Xavier
>>> > >
>>> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wangg...@gmail.com>
>>> wrote:
>>> > >
>>> > > > Richard, Matthias:
>>> > > >
>>> > > > 0. Could you describe a bit what are the possible use cases of
>>> > > `allLatest`,
>>> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
>>> minimum
>>> > > > necessary amount, to avoid a swamp of new APIs that no one would
>>> really
>>> > > use
>>> > > > but just complicated the internal code base.
>>> > > >
>>> > > > 1. One minor comment on the other two new APIs: could we rename
>>> `keys`
>>> > to
>>> > > > `all` and `all` to `range` to be consistent with the other store's
>>> > APIs?
>>> > > >
>>> > > > 2. One meta comment on the implementation details: since both
>>> `keys`
>>> > and
>>> > > > `all` would likely touch multiple segments, we may need to use the
>>> > > internal
>>> > > > `SegmentIterator` class, but currently it always requires a Bytes
>>> from
>>> > > and
>>> > > > Bytes to for its constructor. What changes we need to make for that
>>> > > class?
>>> > > >
>>> > > >
>>> > > > Guozhang
>>> > > >
>>> > > >
>>> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
>>> yohan.richard...@gmail.com
>>> > >
>>> > > > wrote:
>>> > > >
>>> > > > > We should split KAFKA-4499 into several sub-issues with 4499
>>> being
>>> > the
>>> > > > > parent issue.
>>> > > > > Adding the implementation to CachingWindowStore,
>>> RocksDBWindowStore,
>>> > > etc
>>> > > > > will each require the addition of a test and implementing the
>>> methods
>>> > > > which
>>> > > > > is not trivial.
>>> > > > > This way, it should be easier to manage the progress of the KIP.
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
>>> > matth...@confluent.io
>>> > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Thanks for driving this and sorry for late response. With
>>> release
>>> > > > > > deadline it was pretty busy lately.
>>> > > > > >
>>> > > > > > Can you please add a description for the suggested method, what
>>> > they
>>> > > > are
>>> > > > > > going to return? It's a little unclear to me atm.
>>> > > > > >
>>> > > > > > It would also be helpful to discuss, for which use case each
>>> method
>>> > > is
>>> > > > > > useful. This might also help to identify potential gaps for
>>> which
>>> > > > > > another API might be more helpful.
>>> > > > > >
>>> > > > > > Also, we should talk about provided guarantees when using those
>>> > APIs
>>> > > > > > with regard to consistency -- not saying that we need to
>>> provide
>>> > > strong
>>> > > > > > guarantees, but he KIP should describe what user can expect.
>>> > > > > >
>>> > > > > >
>>> > > > > > -Matthias
>>> > > > > >
>>> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
>>> > > > > > > Hello, I would like to solicit review and comment on this
>>> issue
>>> > > (link
>>> > > > > > > below):
>>> > > > > > >
>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > -- Guozhang
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Reply via email to