Soliciting more feedback before vote. On Wed, Oct 18, 2017 at 8:26 PM, Richard Yu <yohan.richard...@gmail.com> wrote:
> Is this KIP close to completion? Because we could start working on the > code itself now. (Its at about this stage). > > On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yohan.richard...@gmail.com> > wrote: > >> As Guozhang Wang mentioned earlier, we want to mirror the structure of >> similar Store class (namely KTable). The WindowedStore class might be >> unique in itself as it uses fetch() methods, but in my opinion, uniformity >> should be better suited for simplicity. >> >> On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xav...@confluent.io> >> wrote: >> >>> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions >>> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range" >>> keyword for when we query a specific range of keys? >>> >>> Xavier >>> >>> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yohan.richard...@gmail.com> >>> wrote: >>> >>> > Thanks for the clarifications, Xavier. >>> > I have removed most of the methods except for keys() and all() which >>> has >>> > been renamed to Guozhang Wang's suggestions. >>> > >>> > Hope this helps. >>> > >>> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xav...@confluent.io> >>> > wrote: >>> > >>> > > Thanks for the KIP Richard, this is a very useful addition! >>> > > >>> > > As far as the API changes, I just have a few comments on the methods >>> that >>> > > don't seem directly related to the KIP title, and naming of course >>> :). >>> > > On the implementation, see my notes further down that will hopefully >>> > > clarify a few things. >>> > > >>> > > Regarding the "bonus" methods: >>> > > I agree with Guozhang that the KIP lacks proper motivation for >>> adding the >>> > > min, max, and allLatest methods. >>> > > It is also not clear to me what min and max would really mean, what >>> > > ordering do we refer to here? Are we first ordering by time, then >>> key, or >>> > > first by key, then time? >>> > > The allLatest method might be useful, but I don't really see how it >>> would >>> > > be used in practice if we have to scan the entire range of keys for >>> all >>> > the >>> > > state stores, every single time. >>> > > >>> > > Maybe we could flesh the motivation behind those extra methods, but >>> in >>> > the >>> > > interest of time, and moving the KIP forward it might make sense to >>> file >>> > a >>> > > follow-up once we have more concrete use-cases. >>> > > >>> > > On naming: >>> > > I also agree with Guozhang that "keys()" should be renamed. It feels >>> a >>> > bit >>> > > of a misnomer, since it not only returns keys, but also the values. >>> > > >>> > > As far as what to rename it to, I would argue we already have some >>> > > discrepancy between key-value stores using range() vs. window stores >>> > using >>> > > fetch(). >>> > > I assume we called the window method "fetch" instead of "get" >>> because you >>> > > might get back more than one window for the requested key. >>> > > >>> > > If we wanted to make things consistent with both existing key-value >>> store >>> > > naming and window store naming, we could do the following: >>> > > Decide that "all" always refers to the entire range of keys, >>> independent >>> > of >>> > > the window and similarly "range" always refers to a particular range >>> of >>> > > keys, irrespective of the window. >>> > > We can then prefix methods with "fetch" to indicate that more than >>> one >>> > > window may be returned for each key in the range. >>> > > >>> > > This would give us: >>> > > - a new fetchAll() method for all the keys, which makes it clear >>> that you >>> > > might get back the same key in different windows >>> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a >>> given >>> > > time range, again with possibly more than one window per key >>> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K, >>> long, >>> > > long) and deprecate the old one to indicate a range of keys >>> > > >>> > > One inconsistency I noted: the "Proposed Changes" section in your KIP >>> > talks >>> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer >>> to >>> > the >>> > > all(from, to) method, but I'm sure you'll fix that once we decide on >>> > > naming. >>> > > >>> > > On the implementation side: >>> > > You mentioned that caching and rocksdb store have very different >>> > key/value >>> > > structures, and while it appears to be that way on the surface, the >>> > > structure between the two is actually very similar. Keys in the >>> cache are >>> > > prefixed with a segment ID to ensure the ordering in the cache stays >>> > > consistent with the rocksdb implementation, which maintains multiple >>> > > rocksdb instances, one for each segment. So we just "artificially" >>> mirror >>> > > the segment structure in the cache. >>> > > >>> > > The reason for keeping the ordering consistent is pretty simple: >>> keep in >>> > > mind that when we query a cached window store we are effectively >>> querying >>> > > both the cache and the persistent rocksdb store at the same time, >>> merging >>> > > results from both. To make that merge as painless as possible, we >>> ensure >>> > > the ordering is consistent when querying a range of keys in both >>> stores. >>> > > >>> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple >>> > window >>> > > stores within a topology. >>> > > >>> > > Hope this clarifies some of the less trivial parts of caching window >>> > store. >>> > > >>> > > Cheers, >>> > > Xavier >>> > > >>> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wangg...@gmail.com> >>> wrote: >>> > > >>> > > > Richard, Matthias: >>> > > > >>> > > > 0. Could you describe a bit what are the possible use cases of >>> > > `allLatest`, >>> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a >>> minimum >>> > > > necessary amount, to avoid a swamp of new APIs that no one would >>> really >>> > > use >>> > > > but just complicated the internal code base. >>> > > > >>> > > > 1. One minor comment on the other two new APIs: could we rename >>> `keys` >>> > to >>> > > > `all` and `all` to `range` to be consistent with the other store's >>> > APIs? >>> > > > >>> > > > 2. One meta comment on the implementation details: since both >>> `keys` >>> > and >>> > > > `all` would likely touch multiple segments, we may need to use the >>> > > internal >>> > > > `SegmentIterator` class, but currently it always requires a Bytes >>> from >>> > > and >>> > > > Bytes to for its constructor. What changes we need to make for that >>> > > class? >>> > > > >>> > > > >>> > > > Guozhang >>> > > > >>> > > > >>> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu < >>> yohan.richard...@gmail.com >>> > > >>> > > > wrote: >>> > > > >>> > > > > We should split KAFKA-4499 into several sub-issues with 4499 >>> being >>> > the >>> > > > > parent issue. >>> > > > > Adding the implementation to CachingWindowStore, >>> RocksDBWindowStore, >>> > > etc >>> > > > > will each require the addition of a test and implementing the >>> methods >>> > > > which >>> > > > > is not trivial. >>> > > > > This way, it should be easier to manage the progress of the KIP. >>> > > > > >>> > > > > >>> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax < >>> > matth...@confluent.io >>> > > > >>> > > > > wrote: >>> > > > > >>> > > > > > Thanks for driving this and sorry for late response. With >>> release >>> > > > > > deadline it was pretty busy lately. >>> > > > > > >>> > > > > > Can you please add a description for the suggested method, what >>> > they >>> > > > are >>> > > > > > going to return? It's a little unclear to me atm. >>> > > > > > >>> > > > > > It would also be helpful to discuss, for which use case each >>> method >>> > > is >>> > > > > > useful. This might also help to identify potential gaps for >>> which >>> > > > > > another API might be more helpful. >>> > > > > > >>> > > > > > Also, we should talk about provided guarantees when using those >>> > APIs >>> > > > > > with regard to consistency -- not saying that we need to >>> provide >>> > > strong >>> > > > > > guarantees, but he KIP should describe what user can expect. >>> > > > > > >>> > > > > > >>> > > > > > -Matthias >>> > > > > > >>> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote: >>> > > > > > > Hello, I would like to solicit review and comment on this >>> issue >>> > > (link >>> > > > > > > below): >>> > > > > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > > >>> > > > -- >>> > > > -- Guozhang >>> > > > >>> > > >>> > >>> >> >> >