Is this KIP close to completion? Because we could start working on the code itself now. (Its at about this stage).
On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yohan.richard...@gmail.com> wrote: > As Guozhang Wang mentioned earlier, we want to mirror the structure of > similar Store class (namely KTable). The WindowedStore class might be > unique in itself as it uses fetch() methods, but in my opinion, uniformity > should be better suited for simplicity. > > On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xav...@confluent.io> > wrote: > >> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions >> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range" >> keyword for when we query a specific range of keys? >> >> Xavier >> >> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yohan.richard...@gmail.com> >> wrote: >> >> > Thanks for the clarifications, Xavier. >> > I have removed most of the methods except for keys() and all() which has >> > been renamed to Guozhang Wang's suggestions. >> > >> > Hope this helps. >> > >> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xav...@confluent.io> >> > wrote: >> > >> > > Thanks for the KIP Richard, this is a very useful addition! >> > > >> > > As far as the API changes, I just have a few comments on the methods >> that >> > > don't seem directly related to the KIP title, and naming of course :). >> > > On the implementation, see my notes further down that will hopefully >> > > clarify a few things. >> > > >> > > Regarding the "bonus" methods: >> > > I agree with Guozhang that the KIP lacks proper motivation for adding >> the >> > > min, max, and allLatest methods. >> > > It is also not clear to me what min and max would really mean, what >> > > ordering do we refer to here? Are we first ordering by time, then >> key, or >> > > first by key, then time? >> > > The allLatest method might be useful, but I don't really see how it >> would >> > > be used in practice if we have to scan the entire range of keys for >> all >> > the >> > > state stores, every single time. >> > > >> > > Maybe we could flesh the motivation behind those extra methods, but in >> > the >> > > interest of time, and moving the KIP forward it might make sense to >> file >> > a >> > > follow-up once we have more concrete use-cases. >> > > >> > > On naming: >> > > I also agree with Guozhang that "keys()" should be renamed. It feels a >> > bit >> > > of a misnomer, since it not only returns keys, but also the values. >> > > >> > > As far as what to rename it to, I would argue we already have some >> > > discrepancy between key-value stores using range() vs. window stores >> > using >> > > fetch(). >> > > I assume we called the window method "fetch" instead of "get" because >> you >> > > might get back more than one window for the requested key. >> > > >> > > If we wanted to make things consistent with both existing key-value >> store >> > > naming and window store naming, we could do the following: >> > > Decide that "all" always refers to the entire range of keys, >> independent >> > of >> > > the window and similarly "range" always refers to a particular range >> of >> > > keys, irrespective of the window. >> > > We can then prefix methods with "fetch" to indicate that more than one >> > > window may be returned for each key in the range. >> > > >> > > This would give us: >> > > - a new fetchAll() method for all the keys, which makes it clear that >> you >> > > might get back the same key in different windows >> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a >> given >> > > time range, again with possibly more than one window per key >> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K, >> long, >> > > long) and deprecate the old one to indicate a range of keys >> > > >> > > One inconsistency I noted: the "Proposed Changes" section in your KIP >> > talks >> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer >> to >> > the >> > > all(from, to) method, but I'm sure you'll fix that once we decide on >> > > naming. >> > > >> > > On the implementation side: >> > > You mentioned that caching and rocksdb store have very different >> > key/value >> > > structures, and while it appears to be that way on the surface, the >> > > structure between the two is actually very similar. Keys in the cache >> are >> > > prefixed with a segment ID to ensure the ordering in the cache stays >> > > consistent with the rocksdb implementation, which maintains multiple >> > > rocksdb instances, one for each segment. So we just "artificially" >> mirror >> > > the segment structure in the cache. >> > > >> > > The reason for keeping the ordering consistent is pretty simple: keep >> in >> > > mind that when we query a cached window store we are effectively >> querying >> > > both the cache and the persistent rocksdb store at the same time, >> merging >> > > results from both. To make that merge as painless as possible, we >> ensure >> > > the ordering is consistent when querying a range of keys in both >> stores. >> > > >> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple >> > window >> > > stores within a topology. >> > > >> > > Hope this clarifies some of the less trivial parts of caching window >> > store. >> > > >> > > Cheers, >> > > Xavier >> > > >> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wangg...@gmail.com> >> wrote: >> > > >> > > > Richard, Matthias: >> > > > >> > > > 0. Could you describe a bit what are the possible use cases of >> > > `allLatest`, >> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a >> minimum >> > > > necessary amount, to avoid a swamp of new APIs that no one would >> really >> > > use >> > > > but just complicated the internal code base. >> > > > >> > > > 1. One minor comment on the other two new APIs: could we rename >> `keys` >> > to >> > > > `all` and `all` to `range` to be consistent with the other store's >> > APIs? >> > > > >> > > > 2. One meta comment on the implementation details: since both `keys` >> > and >> > > > `all` would likely touch multiple segments, we may need to use the >> > > internal >> > > > `SegmentIterator` class, but currently it always requires a Bytes >> from >> > > and >> > > > Bytes to for its constructor. What changes we need to make for that >> > > class? >> > > > >> > > > >> > > > Guozhang >> > > > >> > > > >> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu < >> yohan.richard...@gmail.com >> > > >> > > > wrote: >> > > > >> > > > > We should split KAFKA-4499 into several sub-issues with 4499 being >> > the >> > > > > parent issue. >> > > > > Adding the implementation to CachingWindowStore, >> RocksDBWindowStore, >> > > etc >> > > > > will each require the addition of a test and implementing the >> methods >> > > > which >> > > > > is not trivial. >> > > > > This way, it should be easier to manage the progress of the KIP. >> > > > > >> > > > > >> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax < >> > matth...@confluent.io >> > > > >> > > > > wrote: >> > > > > >> > > > > > Thanks for driving this and sorry for late response. With >> release >> > > > > > deadline it was pretty busy lately. >> > > > > > >> > > > > > Can you please add a description for the suggested method, what >> > they >> > > > are >> > > > > > going to return? It's a little unclear to me atm. >> > > > > > >> > > > > > It would also be helpful to discuss, for which use case each >> method >> > > is >> > > > > > useful. This might also help to identify potential gaps for >> which >> > > > > > another API might be more helpful. >> > > > > > >> > > > > > Also, we should talk about provided guarantees when using those >> > APIs >> > > > > > with regard to consistency -- not saying that we need to provide >> > > strong >> > > > > > guarantees, but he KIP should describe what user can expect. >> > > > > > >> > > > > > >> > > > > > -Matthias >> > > > > > >> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote: >> > > > > > > Hello, I would like to solicit review and comment on this >> issue >> > > (link >> > > > > > > below): >> > > > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > -- Guozhang >> > > > >> > > >> > >> > >