Hi all, Thank you for the feedback and the interesting solutions you suggested.
The KIP and the corresponding PR are updated as follows: - Snapshot semantics is implemented in the `get(key, fromTime, toTime)` method to guarantee consistency. - The `ValueIterator` interface is replaced with `VersionedRecordIterator<V> implements Iterator<VersionedRecord<V>>`. Consequently, the `MultiVersionedKeyQuery` class is modified to ` MultiVersionedKeyQuery<K, V> implements Query<VersionedRecordIterator<V >>`. - The former `get(key,ts)` method is updated so that it returns records with their `validTo` timestamps as well. I still kept the old constructor of the `VersionedRecord` class with only two parameters. The `validTo` field is initialized by `MAX` as default, but I think the default value for the `validTo` field must be `*PUT_RETURN_CODE_VALID_TO_UNDEFINED*` which is defined in the `VersionedKeyValueStore` interface and is actually `-1`. Am I right? Open question: - It seems like we do not have a conclusion about the default ordering issue. Cheers, Alieh On Thu, Nov 16, 2023 at 9:56 AM Bruno Cadonna <cado...@apache.org> wrote: > Hi, > > 80) > We do not keep backwards compatibility with IQv1, right? I would even > say that currently we do not need to keep backwards compatibility among > IQv2 versions since we marked the API "Evolving" (do we only mean code > compatibility here or also behavioral compatibility?). I propose to try > to not limit ourselves for backwards compatibility that we explicitly > marked as evolving. > I re-read the discussion on KIP-985. In that discussion, we were quite > focused on what the state store provides. I see that for range queries, > we have methods on the state store interface that specify the order, but > that should be kind of orthogonal to the IQv2 query type. Let's assume > somebody in the future adds a state store implementation that is not > order based. To account for use cases where the order does not matter, > this person might also add a method to the state store interface that > does not guarantee any order. However, our range query type is specified > to guarantee order by default. So we need to add something like > withNoOrder() to the query type to allow the use cases that does not > need order and has the better performance in IQ. That does not look very > nice to me. Having the no-order-guaranteed option does not cost us > anything and it keeps the IQv2 interface flexible. I assume we want to > drop the Evolving annotation at some point. > Sorry for not having brought this up in the discussion about KIP-985. > > Best, > Bruno > > > > > > On 11/15/23 6:56 AM, Matthias J. Sax wrote: > > Just catching up on this one. > > > > > > 50) I am also in favor of setting `validTo` in VersionedRecord for > > single-key single-ts lookup; it seems better to return the proper > > timestamp. The timestamp is already in the store and it's cheap to > > extract it and add to the result, and it might be valuable information > > for the user. Not sure though if we should deprecate the existing > > constructor though, because for "latest" it's convenient to have? > > > > > > 60) Yes, I meant `VersionedRecord`. Sorry for the mixup. > > > > > > 80) We did discuss this question on KIP-985 (maybe you missed it Bruno). > > It's kinda tricky. > > > > Historically, it seems that IQv1, ie, the `ReadOnlyXxx` interfaces > > provide a clear contract that `range()` is ascending and > > `reverseRange()` is descending. > > > > For `RangeQuery`, the question is, if we did implicitly inherit this > > contract? Our conclusion on KIP-985 discussion was, that we did inherit > > it. If this holds true, changing the contract would be a breaking change > > (what might still be acceptable, given that the interface is annotated > > as unstable, and that IQv2 is not widely adopted yet). I am happy to go > > with the 3-option contract, but just want to ensure we all agree it's > > the right way to go, and we are potentially willing to pay the price of > > backward incompatibility. > > > > > > > >> Do we need a snapshot semantic or can we specify a weaker but still > >> useful semantic? > > > > I don't think we necessarily need it, but as pointed out by Lucas, all > > existing queries provide it. Overall, my main point is really about not > > implementing something "random", but defining a proper binding contract > > that allows users to reason about it. > > > > I general, I agree that weaker semantics might be sufficient, but I am > > not sure if we can implement anything weaker in a reasonable way? Happy > > to be convinced otherwise. (I have some example, that I will omit for > > now, as I hope we can actually go with snapshot semantics.) > > > > The RocksDB Snaptshot idea from Lucas sounds very promising. I was not > > aware that we could do this with RocksDB (otherwise I might have > > suggested it on the PR right away). I guess the only open question > > remaining would be, if we can provide the same guarantees for a future > > in-memory implementation for VersionedStores? It sounds possible to do, > > but we should have some level of confidence about it? > > > > > > -Matthias > > > > On 11/14/23 6:33 AM, Lucas Brutschy wrote: > >> Hi Alieh, > >> > >> I agree with Bruno that a weaker guarantee could be useful already, > >> and it's anyway possible to strengthen the guarantees later on. On the > >> other hand, it would be nice to provide a consistent view across all > >> segments if it doesn't come with major downsides, because until now IQ > >> does provide a consistent view (via iterators), and this would be the > >> first feature that diverges from that guarantee. > >> > >> I think a consistent could be achieved relatively easily by creating a > >> snapshot (https://github.com/facebook/rocksdb/wiki/Snapshot) that will > >> ensure a consistent view on a single DB and using > >> `ReadOptions.setSnapshot` for the gets. Since versioned state stores > >> segments seem to be backed by a single RocksDB instance (that was > >> unclear to me during our earlier discussion), a single snapshot should > >> be enough to guarantee a consistent view - we just need to make sure > >> to clean up the snapshots after use, similar to iterators. If we > >> instead need a consistent view across multiple RocksDB instances, we > >> may have to acquire a write lock on all of those (could use the > >> current object monitors of our `RocksDB` interface) for the duration > >> of creating snapshots across all instances - which I think would also > >> be permissible performance-wise. Snapshots are just a sequence number > >> and should be pretty lightweight to create (they have, however, > >> downside when it comes to compaction just like iterators). > >> > >> With that in mind, I would be in favor of at least exploring the > >> option of using snapshots for a consistent view here, before dropping > >> this useful guarantee. > >> > >> Cheers, > >> Lucas > >> > >> On Tue, Nov 14, 2023 at 2:20 PM Bruno Cadonna <cado...@apache.org> > wrote: > >>> > >>> Hi Alieh, > >>> > >>> Regarding the semantics/guarantees of the query type: > >>> > >>> Do we need a snapshot semantic or can we specify a weaker but still > >>> useful semantic? > >>> > >>> An option could be to guarantee that: > >>> 1. the query returns the latest version when the query arrived at the > >>> state store > >>> 2. the query returns a valid history, i.e., versions with adjacent and > >>> non-overlapping validity intervals. > >>> > >>> I think guarantee 1 is quite obvious. Guarantee 2 maybe needs some > >>> explanation. If we do not avoid writes to the state store during the > >>> processing of interactive queries, it might for example happen that the > >>> latest version in the state store moves to data structures that are > >>> responsible for older versions. In our RocksDB implementation that are > >>> the segments. Thus, it could be that during query processing Kafka > >>> Streams reads the latest value x and encounters again x in a segment > >>> because a processor put a newer version of x in the versioned state > >>> store. A similar scenario might also happen to earlier versions. If > >>> Streams does not account for such cases it could return invalid > >>> histories. > >>> > >>> Maybe such weaker guarantees are enough for most use cases. > >>> > >>> You could consider implementing weaker guarantees as I described and if > >>> there is demand propose stricter guarantees in a follow-up KIP. > >>> > >>> Maybe there are also other simpler guarantees that make sense. > >>> > >>> Best, > >>> Bruno > >>> > >>> > >>> On 11/9/23 12:30 PM, Bruno Cadonna wrote: > >>>> Hi, > >>>> > >>>> Thanks for the updates! > >>>> > >>>> First my take on previous comments: > >>>> > >>>> > >>>> 50) > >>>> I am in favor of deprecating the constructor that does not take the > >>>> validTo parameter. That implies that I am in favor of modifying > >>>> get(key, > >>>> asOf) to set the correct validTo. > >>>> > >>>> > >>>> 60) > >>>> I am in favor of renaming ValueIterator to VersionedRecordIterator and > >>>> define it as: > >>>> > >>>> public interface VersionedRecordIterator<V> extends > >>>> Iterator<VersionedRecord<V>> > >>>> > >>>> (Matthias, you mixed up ValueAndTimestamp with VersionedRecord in your > >>>> last e-mail, didn't you? Just double-checking if I understood what you > >>>> are proposing.) > >>>> > >>>> > >>>> 70) > >>>> I agree with Matthias that adding a new method on the > >>>> VersionedKeyValueStore interface defeats the purpose of one of the > >>>> goals > >>>> of IQv2, i.e., not to need to extend the state store interface for > >>>> IQ. I > >>>> would say if we do not need the method in normal processing, we should > >>>> not extend the public state store interface. BTW, nobody forces you to > >>>> StoreQueryUtils. I think that internal utils class was introduced for > >>>> convenience to leverage existing methods on the state store interface. > >>>> > >>>> > >>>> 80) > >>>> Why do we limit ourselves by specifying a default order on the result? > >>>> Different state store implementation might have different strategies > to > >>>> store the records which affects the order in which the records are > >>>> returned if they are not sorted before they are returned to the user. > >>>> Some users might not be interested in an order of the result and so > >>>> there is no reason those users pay the cost for sorting. I propose to > >>>> not specify a default order but sort the results (if needed) when > >>>> withDescendingX() and withAscendingX() is specified on the query > >>>> object. > >>>> > >>>> > >>>> Regarding the snapshot guarantees for the iterators, I need to think a > >>>> bit more about it. I will come back to this thread in the next days. > >>>> > >>>> > >>>> Best, > >>>> Bruno > >>>> > >>>> > >>>> On 11/8/23 5:30 PM, Alieh Saeedi wrote: > >>>>> Thank you, Bruno and Matthias, for keeping the discussion going and > >>>>> for > >>>>> reviewing the PR. > >>>>> > >>>>> Here are the KIP updates: > >>>>> > >>>>> - I removed the `peek()` from the `ValueIterator` interface > since > >>>>> we do > >>>>> not need it. > >>>>> - Yes, Bruno, the `validTo` field in the `VersionedRecod` > >>>>> class is > >>>>> exclusive. I updated the javadocs for that. > >>>>> > >>>>> > >>>>> Very important critical open questions. I list them here based on > >>>>> priority > >>>>> (descendingly). > >>>>> > >>>>> - I implemented the `get(key, fromtime, totime)` method here > >>>>> > >>>>> < > https://github.com/aliehsaeedii/kafka/blob/9578b7cb7cdade22cc63f671693f5aeb993937ca/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBVersionedStore.java#L262 > >: > >>>>> the problem is that this implementation does not guarantee > >>>>> consistency > >>>>> because processing might continue interleaved (no snapshot > >>>>> semantics is > >>>>> implemented). More over, it materializes all results in memory. > >>>>> - Solution 1: Use a lock and release it after retrieving all > >>>>> desired > >>>>> records from all segments. > >>>>> - positive point: snapshot semantics is implemented > >>>>> - negative points: 1) It is expensive since iterating > >>>>> over all > >>>>> segments may take a long time. 2) It still requires > >>>>> materializing results > >>>>> on memory > >>>>> - Solution 2: use `RocksDbIterator`. > >>>>> - positive points: 1) It guarantees snapshot segments. > >>>>> 2) It > >>>>> does > >>>>> not require materializing results in memory. > >>>>> - negative points: it is expensive because, anyway, we > >>>>> need to > >>>>> iterate over all (many) segments. > >>>>> > >>>>> Do you have any thoughts on this issue? (ref: Matthias's > >>>>> comment > >>>>> < > https://github.com/apache/kafka/pull/14626#pullrequestreview-1709280589>) > >>>>> > >>>>> - I added the field `validTo` in `VersionedRecord`. Its default > >>>>> value is > >>>>> MAX. But as Matthias mentioned, for the single-key single-ts > >>>>> (`VersionedKeyQuery` in KIP-960), it may not always be true. > >>>>> If the > >>>>> returned record belongs to an old segment, maybe it is not valid > >>>>> any more. > >>>>> So MAX is not the correct value for `ValidTo`. Two solutions > come > >>>>> to mind: > >>>>> - Solution 1: make the `ValidTo` as an `Optional` and set > >>>>> it to > >>>>> `empty` for the retuned result of `get(key, asOfTimestamp)`. > >>>>> - Solution 2: change the implementation of `get(key, > >>>>> asOfTimestamp)` > >>>>> so that it finds the correct `validTo` for the returned > >>>>> versionedRecord. > >>>>> > >>>>> - In this KIP and the next one, even though the default > >>>>> ordering is > >>>>> with ascending ts, I added the method > `withAscendingTimestamps()` > >>>>> to have > >>>>> more user readibility (as Bruno suggested), while Hanyu did > >>>>> only add > >>>>> `withDescending...` methods (he did not need ascending because > >>>>> that's the > >>>>> default anyway). Matthias believes that we should not have > >>>>> inconsistencies (he actually hates them:D). Shall I change my > KIP > >>>>> or Hanyu? > >>>>> Thoughts? > >>>>> > >>>>> > >>>>> That would be maybe helpful to look into the PR > >>>>> <https://github.com/apache/kafka/pull/14626> for more clarity and > even > >>>>> review that ;-) > >>>>> > >>>>> Cheers, > >>>>> Alieh > >>>>> > >>>>> On Thu, Nov 2, 2023 at 7:13 PM Bruno Cadonna <cado...@apache.org> > >>>>> wrote: > >>>>> > >>>>>> Hi Alieh, > >>>>>> > >>>>>> First of all, I like the examples. > >>>>>> > >>>>>> Is validTo in VersionedRecord exclusive or inclusive? > >>>>>> In the javadocs you write: > >>>>>> > >>>>>> "@param validTo the latest timestamp that value is valid" > >>>>>> > >>>>>> I think that is not true because the validity is defined by the > start > >>>>>> time of the new version. The new and the old version cannot both be > >>>>>> valid at that same time. > >>>>>> > >>>>>> Theoretically, you could set validTo to the start time of the new > >>>>>> version - 1. However, what is the unit of the 1? Is it nanoseconds? > >>>>>> Milliseconds? Seconds? Sure we could agree on one, but I think it is > >>>>>> more elegant to just make the validTo exclusive. Actually, you > >>>>>> used it > >>>>>> as exclusive in your examples. > >>>>>> > >>>>>> > >>>>>> Thanks for the KIP! > >>>>>> > >>>>>> Best, > >>>>>> Bruno > >>>>>> > >>>>>> On 11/1/23 9:01 PM, Alieh Saeedi wrote: > >>>>>>> Hi all, > >>>>>>> @Matthias: I think Victoria was right. I must add the method > >>>>>>> `get(key, > >>>>>>> fromTime, toTime)` to the interface `VersionedKeyValueStore`. Right > >>>>>>> now, > >>>>>>> having the method only in `RocksDBVersionedStore`, made me to > >>>>>>> have an > >>>>>>> instance of `RocksDBVersionedStore` (instead of > >>>>>>> `VersionedKeyValueStore`) > >>>>>>> in `StoreQueryUtils.runMultiVersionedKeyQuery()` method. In > >>>>>>> future, we > >>>>>> are > >>>>>>> going to use the same method for In-memory/SPDB/blaBla versioned > >>>>>>> stores. > >>>>>>> Then either this method won't work any more, or we have to add > >>>>>>> code (if > >>>>>>> clauses) for each type of versioned stores. What do you think about > >>>>>>> that? > >>>>>>> > >>>>>>> Bests, > >>>>>>> Alieh > >>>>>>> > >>>>>>> On Tue, Oct 24, 2023 at 10:01 PM Alieh Saeedi < > asae...@confluent.io> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Thank you, Matthias, Bruno, and Guozhang for keeping the > discussion > >>>>>> going. > >>>>>>>> > >>>>>>>> Here is the list of changes I made: > >>>>>>>> > >>>>>>>> 1. I enriched the "Example" section as Bruno suggested. Do > >>>>>>>> you > >>>>>> please > >>>>>>>> have a look at that section? I think I devised an > >>>>>>>> interesting one > >>>>>> ;-) > >>>>>>>> 2. As Matthias and Guozhang suggested, I renamed variables > >>>>>>>> and > >>>>>> methods > >>>>>>>> as follows: > >>>>>>>> - "fromTimestamp" -> "fromTime" > >>>>>>>> - "asOfTimestamp" -> "toTime" > >>>>>>>> - "from(Instant fromTime)" -> "fromTime(Instant > fromTime)" > >>>>>>>> - "asOf(Instant toTime)" -> "toTime(Instant toTime)" > >>>>>>>> 3. > >>>>>>>> > >>>>>>>> About throwing an NPE when time bounds are `null`: Ok, > >>>>>>>> Matthias, I > >>>>>> am > >>>>>>>> convinced by your idea. I consider a null timestamp as "no > >>>>>>>> bound". > >>>>>> But I > >>>>>>>> had to change KIP-960 and the corresponding PR to be > >>>>>>>> consistent as > >>>>>> well. > >>>>>>>> > >>>>>>>> Answering questions and some more discussion points: > >>>>>>>> > >>>>>>>> 1. > >>>>>>>> > >>>>>>>> For now, I keep the class names as they are. > >>>>>>>> 2. > >>>>>>>> > >>>>>>>> About the new field "validTo" in VersionedRecord. Yes > >>>>>>>> Matthias I > >>>>>> keep > >>>>>>>> the old constructor, which does not have `validTo` as an > >>>>>>>> input > >>>>>> parameter. > >>>>>>>> But in the body of the old constructor, I consider the > >>>>>>>> default > >>>>>> value MAX > >>>>>>>> for the validTo field. I think MAX must be the default > >>>>>>>> value for > >>>>>> `validTo` > >>>>>>>> since before inserting a tombstone or a new value for the > >>>>>>>> same > >>>>>>>> key, > >>>>>> the > >>>>>>>> value must be valid till iternity. > >>>>>>>> 3. > >>>>>>>> > >>>>>>>> Regarding changing the ValueIterator interface from `public > >>>>>> interface > >>>>>>>> ValueIterator<V> extends Iterator<V>` to `public interface > >>>>>>>> ValueIterator<V> extends Iterator<VersionedRecord<V>>`: > >>>>>>>> Matthias, I > >>>>>> do not > >>>>>>>> know how it improves type safety because the > >>>>>>>> MultiVersionedKeyQuery > >>>>>> class > >>>>>>>> returns a ValueIterator of VersionedRecord any way. But if > we > >>>>>>>> want > >>>>>> to be > >>>>>>>> consistent with KeyValueIterator, we must apply the > >>>>>>>> changes you > >>>>>> suggested. > >>>>>>>> 4. > >>>>>>>> > >>>>>>>> Regarding adding the new `get(key, fromTime, toTime)` method > >>>>>>>> to the > >>>>>>>> public interface `VersionedKeyValueStore` or only adding > >>>>>>>> it to > >>>>>>>> the > >>>>>>>> class `RocksDBVersionedStore`: In the KIP, I changed the > >>>>>>>> interface > >>>>>> as > >>>>>>>> Victoria suggested. But still, I am not convinced why we > >>>>>>>> do that. > >>>>>> @Victoria: > >>>>>>>> Do you please clarify why we have to define it in the > >>>>>>>> interface? > >>>>>> More > >>>>>>>> specifically, why do we need to use generic > >>>>>>>> VersionedKeyValueStore > >>>>>>>> instead of simply using the implementing classes? > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Alieh > >>>>>>>> > >>>>>>>> On Sat, Oct 14, 2023 at 3:35 AM Guozhang Wang < > >>>>>> guozhang.wang...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks Alieh for the KIP, as well as a nice summary of all the > >>>>>>>>> discussions! Just my 2c regarding your open questions: > >>>>>>>>> > >>>>>>>>> 1. I just checked KIP-889 > >>>>>>>>> ( > >>>>>>>>> > >>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores > >>>>>>>>> ) > >>>>>>>>> and we used "VersionedRecord<V> get(K key, long asOfTimestamp);", > >>>>>>>>> so I > >>>>>>>>> feel to be consistent with this API is better compared with being > >>>>>>>>> consistent with "WindowKeyQuery"? > >>>>>>>>> > >>>>>>>>> 3. I agree with Matthias that naming is always tricky, and I also > >>>>>>>>> tend > >>>>>>>>> to be consistent with what we already have (even if in retro it > >>>>>>>>> may > >>>>>>>>> not be the best idea :P and if that was really becoming a > >>>>>>>>> complaint, > >>>>>>>>> we would change it across the board in one shot as well later). > >>>>>>>>> > >>>>>>>>> Guozhang > >>>>>>>>> > >>>>>>>>> On Wed, Oct 11, 2023 at 9:12 PM Matthias J. Sax < > mj...@apache.org> > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Thanks for the update! > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Some thoughts about changes you made and open questions you > >>>>>>>>>> raised: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 10) About asOf vs until: I was just looking into > >>>>>>>>>> `WindowKeyQuery`, > >>>>>>>>>> `WindowRangeQuery` and also `ReadOnlyWindowStore` interfaces. > For > >>>>>> those, > >>>>>>>>>> we use "timeFrom" and "timeTo", so it seems best to actually use > >>>>>>>>>> `to(Instant toTime)` to keep the naming consistent across the > >>>>>>>>>> board? > >>>>>>>>>> > >>>>>>>>>> If yes, we should also do `from (Instant fromTime)` and use > >>>>>>>>>> getters > >>>>>>>>>> `fromTime()` and `toTime()` -- given that it's range bounds it > >>>>>>>>>> seems > >>>>>>>>>> acceptable to me, to diverge a little bit from KIP-960 > >>>>>> `asOfTimestamp()` > >>>>>>>>>> -- but we could also rename it to `asOfTime()`? -- Given that we > >>>>>>>>>> strongly type with `Instant` I am not worried about semantic > >>>>>> ambiguity. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 20) About throwing a NPE when time bounds are `null` -- why? > (For > >>>>>>>>>> the > >>>>>>>>>> key it makes sense as is mandatory to have a key.) Could we not > >>>>>>>>>> interpret `null` as "no bound". We did KIP-941 to add `null` for > >>>>>>>>>> open-ended `RangeQueries`, so I am wondering if we should just > >>>>>>>>>> stick > >>>>>> to > >>>>>>>>>> the same semantics? > >>>>>>>>>> > >>>>>>>>>> Cf > >>>>>>>>>> > >>>>>>>>> > >>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-941%3A+Range+queries+to+accept+null+lower+and+upper+bounds > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 30) About the class naming. That's always tricky, and I am not > >>>>>>>>>> married > >>>>>>>>>> to my proposal. I agree with Bruno that the other suggested > names > >>>>>>>>>> are > >>>>>>>>>> not really better. -- The underlying idea was, to get some > >>>>>> "consistent" > >>>>>>>>>> naming across the board. > >>>>>>>>>> > >>>>>>>>>> Existing `KeyQuery` > >>>>>>>>>> New `VersionedKeyQuery` (KIP-960; we add a prefix) > >>>>>>>>>> New `MultiVersionKeyQuery` (this KIP; extend the prefix with a > >>>>>>>>> pre-prefix) > >>>>>>>>>> > >>>>>>>>>> Existing `RangeQuery` > >>>>>>>>>> New `MultiVersionRangeQuery` (KIP-969; add same prefix as above) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 40) I am fine with not adding `range(from, to)` -- it was just > an > >>>>>> idea. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Some more follow up question: > >>>>>>>>>> > >>>>>>>>>> 50) You propose to add a new constructor and getter to > >>>>>> `VersionedRecord` > >>>>>>>>>> -- I am wondering if this implies that `validTo` is optional > >>>>>>>>>> because > >>>>>> the > >>>>>>>>>> existing constructor is not deprecated? -- Also, what happens if > >>>>>>>>>> `validTo` is not set and `valueTo()` is called? Or do we > >>>>>>>>>> intent to > >>>>>> make > >>>>>>>>>> `validTo` mandatory? > >>>>>>>>>> > >>>>>>>>>> Maybe this question can only be answered when working on the > >>>>>>>>>> code, > >>>>>> but I > >>>>>>>>>> am wondering if we should make `validTo` mandatory or not... And > >>>>>>>>>> what > >>>>>>>>>> the "blast radius" of changing `VersionedRecord` will be in > >>>>>>>>>> general. > >>>>>> Do > >>>>>>>>>> you have already some POC PR that we could look at to get some > >>>>>>>>>> signals > >>>>>>>>>> about this? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 60) The new query class is defined to return > >>>>>>>>>> `ValueIterator<VersionedRecord<V>>` -- while I like the idea > >>>>>>>>>> to add > >>>>>>>>>> `ValueIterator<V>` in a generic way on the one hand, I am > >>>>>>>>>> wondering if > >>>>>>>>>> it might be better to change it, and enforce its usage (ie, > >>>>>>>>>> return > >>>>>> type) > >>>>>>>>>> of `VersionedRecord` to improve type safety (type erasure is > >>>>>>>>>> often a > >>>>>>>>>> pain, and we could mitigate it this way). > >>>>>>>>>> > >>>>>>>>>> Btw: We actually do a similar thing for `KeyValueIterator`. > >>>>>>>>>> > >>>>>>>>>> Ie, > >>>>>>>>>> > >>>>>>>>>> public interface ValueIterator<V> extends > >>>>>> Iterator<ValueAndTimestamp<V>> > >>>>>>>>>> > >>>>>>>>>> and > >>>>>>>>>> > >>>>>>>>>> ValueAndTimestamp<V> peek(); > >>>>>>>>>> > >>>>>>>>>> This would imply that the return type of the new query is > >>>>>>>>>> `ValueIterator<V>` on the interface what seems simpler and more > >>>>>> elegant? > >>>>>>>>>> > >>>>>>>>>> If we go with the change, I am also wondering if we need to > >>>>>>>>>> find a > >>>>>>>>>> better name for the new iterator class? Maybe > >>>>>>>>>> `VersionIterator` or > >>>>>>>>>> something like this? > >>>>>>>>>> > >>>>>>>>>> Of course it might limit the use of `ValueIterator` for other > >>>>>>>>>> value > >>>>>>>>>> types -- not sure if this a limitation that is prohibitive? My > >>>>>>>>>> gut > >>>>>>>>>> feeling is, that is should not be too limiting. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 70) Do we really need the change in `VersionedKeyValueStore` and > >>>>>>>>>> add a > >>>>>>>>>> new method? In the end, the idea of IQv2 is to avoid exactly > >>>>>>>>>> this... > >>>>>> It > >>>>>>>>>> was the main issue for IQv1, that the base interface of the > store > >>>>>> needed > >>>>>>>>>> an update and thus all classed implementing the base interface, > >>>>>>>>>> making > >>>>>>>>>> it very cumbersome to add new query types. -- Of course, we need > >>>>>>>>>> this > >>>>>>>>>> new method on the actually implementation (as private method) > >>>>>>>>>> that can > >>>>>>>>>> be called from `query()` method, but adding it to the interface > >>>>>>>>>> seems > >>>>>> to > >>>>>>>>>> defeat the purpose of IQv2. > >>>>>>>>>> > >>>>>>>>>> Note, for existing IQv2 queries types that go against others > >>>>>>>>>> stores, > >>>>>> the > >>>>>>>>>> public methods already existed when IQv2 was introduces, and > thus > >>>>>>>>>> the > >>>>>>>>>> implementation of these query types just pragmatically re-used > >>>>>> existing > >>>>>>>>>> methods -- but it does not imply that new public method should > be > >>>>>> added. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Matthias > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 10/11/23 5:11 AM, Bruno Cadonna wrote: > >>>>>>>>>>> Thanks for the updates, Alieh! > >>>>>>>>>>> > >>>>>>>>>>> The example in the KIP uses the allVersions() method which we > >>>>>>>>>>> agreed > >>>>>>>>> to > >>>>>>>>>>> remove. > >>>>>>>>>>> > >>>>>>>>>>> Regarding your questions: > >>>>>>>>>>> 1. asOf vs. until: I am fine with both but slightly prefer > >>>>>>>>>>> until. > >>>>>>>>>>> 2. What about KeyMultiVersionsQuery, KeyVersionsQuery (KIP-960 > >>>>>>>>>>> would > >>>>>>>>>>> then be KeyVersionQuery). However, I am also fine with > >>>>>>>>>>> MultiVersionedKeyQuery since none of the names sounds better or > >>>>>>>>>>> worse > >>>>>>>>> to > >>>>>>>>>>> me. > >>>>>>>>>>> 3. I agree with you not to introduce the method with the two > >>>>>>>>>>> bounds > >>>>>> to > >>>>>>>>>>> keep things simple. > >>>>>>>>>>> 4. Forget about fromTime() an asOfTime(), from() and asOf() is > >>>>>>>>>>> fine. > >>>>>>>>>>> 5. The main purpose is to show how to use the API. Maybe make > an > >>>>>>>>> example > >>>>>>>>>>> with just the key to distinguish this query from the single > >>>>>>>>>>> value > >>>>>>>>> query > >>>>>>>>>>> of KIP-960 and then one with a key and a time range. When you > >>>>>>>>>>> iterate > >>>>>>>>>>> over the results you could also call validTo(). Maybe add some > >>>>>>>>>>> actual > >>>>>>>>>>> records in the comments to show what the result might look > like. > >>>>>>>>>>> > >>>>>>>>>>> Regarding the test plan, I hope you also plan to add unit > >>>>>>>>>>> tests in > >>>>>> all > >>>>>>>>>>> of your KIPs. Maybe you could also explain why system tests > >>>>>>>>>>> are not > >>>>>>>>>>> needed here. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Bruno > >>>>>>>>>>> > >>>>>>>>>>> On 10/10/23 5:36 PM, Alieh Saeedi wrote: > >>>>>>>>>>>> Thank you all for the very exact and constructive comments. I > >>>>>>>>>>>> really > >>>>>>>>>>>> enjoyed reading your ideas and all the important points you > >>>>>>>>>>>> made me > >>>>>>>>> aware > >>>>>>>>>>>> of. I updated KIP-968 as follows: > >>>>>>>>>>>> > >>>>>>>>>>>> 1. If the key or time bounds are null, the method > >>>>>>>>>>>> returns > >>>>>>>>>>>> NPE. > >>>>>>>>>>>> 2. The "valid" word: I removed the sentence "all the > >>>>>>>>>>>> records > >>>>>>>>> that are > >>>>>>>>>>>> valid..." and replaced it with an exact explanation. > >>>>>>>>>>>> More > >>>>>> over, I > >>>>>>>>>>>> explained > >>>>>>>>>>>> it with an example in the KIP but not in the > >>>>>>>>>>>> javadocs. Do I > >>>>>> need > >>>>>>>>>>>> to add the > >>>>>>>>>>>> example to the javadocs as well? > >>>>>>>>>>>> 3. Since I followed Bruno's suggestion and removed the > >>>>>>>>> allVersions() > >>>>>>>>>>>> method, the problem of meaningless combinations is > >>>>>>>>>>>> solved, and > >>>>>> I > >>>>>>>>>>>> do not > >>>>>>>>>>>> need any IllegalArgumentException or something like > >>>>>>>>>>>> that. > >>>>>>>>>>>> Therefore, the > >>>>>>>>>>>> change is that if no time bound is specified, the query > >>>>>>>>>>>> returns > >>>>>>>>>>>> the records > >>>>>>>>>>>> with the specified key for all timestamps (all > >>>>>>>>>>>> versions). > >>>>>>>>>>>> 4. As Victoria suggested, adding a method to the > >>>>>>>>>>>> *VersionedKeyValueStore > >>>>>>>>>>>> *interface is essential. So I did that. I had this > >>>>>>>>>>>> method > >>>>>>>>>>>> only > >>>>>>>>> in the > >>>>>>>>>>>> RocksDBVersionedStore class, which was not enough. > >>>>>>>>>>>> 5. I added the *validTo* field to the VersionedRecord > >>>>>>>>>>>> class to > >>>>>> be > >>>>>>>>>>>> able > >>>>>>>>>>>> to represent the tombstones. As you suggested, we > >>>>>>>>>>>> postpone > >>>>>>>>> solving > >>>>>>>>>>>> the > >>>>>>>>>>>> problem of retrieving consecutive tombstones for later. > >>>>>>>>>>>> 6. I added the "Test Plan" section to all KIPs. I hope > >>>>>>>>>>>> what I > >>>>>>>>>>>> wrote is > >>>>>>>>>>>> convincing. > >>>>>>>>>>>> 7. I added the *withAscendingTimestamp()* method to > >>>>>>>>>>>> provide > >>>>>> more > >>>>>>>>>>>> code readability > >>>>>>>>>>>> for the user. > >>>>>>>>>>>> 8. I removed the evil word "get" from all getter > >>>>>>>>>>>> methods. > >>>>>>>>>>>> > >>>>>>>>>>>> There have also been some more suggestions which I am still > not > >>>>>>>>> convinced > >>>>>>>>>>>> or clear about them: > >>>>>>>>>>>> > >>>>>>>>>>>> 1. Regarding asOf vs until: reading all comments, my > >>>>>>>>>>>> conclusion > >>>>>>>>>>>> was that > >>>>>>>>>>>> I keep it as "asOf" (following Walker's idea as the > >>>>>>>>>>>> native > >>>>>>>>> speaker > >>>>>>>>>>>> as well > >>>>>>>>>>>> as Bruno's suggestion to be consistent with > >>>>>> single-key_single_ts > >>>>>>>>>>>> queries). > >>>>>>>>>>>> But I do not have a personal preference. If you > >>>>>>>>>>>> insist on > >>>>>>>>> "until", > >>>>>>>>>>>> I change > >>>>>>>>>>>> it. > >>>>>>>>>>>> 2. Bruno suggested renaming the class > >>>>>>>>>>>> "MultiVersionedKeyQuery" > >>>>>>>>> to sth > >>>>>>>>>>>> else. We already had a long discussion about the name > >>>>>>>>>>>> with > >>>>>>>>>>>> Matthias. I am > >>>>>>>>>>>> open to renaming it to something else, but do you > >>>>>>>>>>>> have any > >>>>>> ideas? > >>>>>>>>>>>> 3. Matthias suggested having a method with two input > >>>>>>>>>>>> parameters > >>>>>>>>> that > >>>>>>>>>>>> enables the user to specify both time bounds in the > same > >>>>>> method. > >>>>>>>>>>>> Isn't it > >>>>>>>>>>>> introducing redundancy? It is somehow disrespectful to > >>>>>>>>>>>> the idea > >>>>>>>>> of > >>>>>>>>>>>> having > >>>>>>>>>>>> composable methods. > >>>>>>>>>>>> 4. Bruno suggested renaming the methods "asOf" and > >>>>>>>>>>>> "from" to > >>>>>>>>>>>> "asOfTime" > >>>>>>>>>>>> and "fromTime". If I do that, then it is not > >>>>>>>>>>>> consistent with > >>>>>>>>> KIP-960. > >>>>>>>>>>>> Moreover, the input parameter is clearly a timestamp, > >>>>>>>>>>>> which > >>>>>>>>> explains > >>>>>>>>>>>> enough. What do you think about that? > >>>>>>>>>>>> 5. I was asked to add more examples to the example > >>>>>>>>>>>> section. My > >>>>>>>>>>>> question > >>>>>>>>>>>> is, what is the main purpose of that? If I know it > >>>>>>>>>>>> clearly, > >>>>>> then > >>>>>>>>> I > >>>>>>>>>>>> can add > >>>>>>>>>>>> what you mean. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Alieh > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Oct 10, 2023 at 1:13 AM Matthias J. Sax > >>>>>>>>>>>> <mj...@apache.org> > >>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Bruno and I had some background conversation about the `get` > >>>>>>>>>>>>> prefix > >>>>>>>>>>>>> question including a few other committers. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The official policy was never changed, and we should not > >>>>>>>>>>>>> add the > >>>>>>>>>>>>> `get`-prefix. It's a slip on our side in previous KIPs to > >>>>>>>>>>>>> add the > >>>>>>>>>>>>> `get`-prefix and we should actually clean it up doing a > >>>>>>>>>>>>> follow up > >>>>>>>>> KIP. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 10/5/23 5:26 AM, Bruno Cadonna wrote: > >>>>>>>>>>>>>> Hi Matthias, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Given all the IQv2 KIPs that use getX and given recent PRs > >>>>>>>>> (internal > >>>>>>>>>>>>>> interfaces mainly) that got merged, I was under the > >>>>>>>>>>>>>> impression > >>>>>>>>> that we > >>>>>>>>>>>>>> moved away from the strict no-getX policy. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I do not think it was an accident using getX in the IQv2 > KIPs > >>>>>> since > >>>>>>>>>>>>>> somebody would have brought it up, otherwise. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I am fine with both types of getters. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If we think, we need to discuss this in a broader context, > >>>>>>>>>>>>>> let's > >>>>>>>>>>>>>> start a > >>>>>>>>>>>>>> separate thread. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 10/5/23 7:44 AM, Matthias J. Sax wrote: > >>>>>>>>>>>>>>> I agree to (almost) everything what Bruno said. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> In general, we tend to move away from using getters > without > >>>>>>>>> "get", > >>>>>>>>>>>>>>>> recently. So I would keep the "get". > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is new to me? Can you elaborate on this point? Why > >>>>>>>>>>>>>>> do you > >>>>>>>>> think > >>>>>>>>>>>>>>> that's the case? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I actually did realize (after Walker mentioned it) that > >>>>>>>>>>>>>>> existing > >>>>>>>>> query > >>>>>>>>>>>>>>> types use `get` prefix, but to me it seems that it was by > >>>>>>>>> accident and > >>>>>>>>>>>>>>> we should consider correcting it? Thus, I would actually > >>>>>>>>>>>>>>> prefer > >>>>>>>>> to not > >>>>>>>>>>>>>>> add the `get` prefix for new methods query types. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> IMHO, we should do a follow up KIP to deprecate all methods > >>>>>>>>>>>>>>> with > >>>>>>>>> `get` > >>>>>>>>>>>>>>> prefix and replace them with new ones without `get` -- > >>>>>>>>>>>>>>> it's of > >>>>>>>>> course > >>>>>>>>>>>>>>> always kinda "unnecessary" noise, but if we don't do it, we > >>>>>>>>>>>>>>> might > >>>>>>>>> get > >>>>>>>>>>>>>>> into more and more inconsistent naming what would result > >>>>>>>>>>>>>>> in a > >>>>>>>>> "bad" > >>>>>>>>>>>>>>> API. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If we indeed want to change the convention and use the > `get` > >>>>>>>>> prefix, I > >>>>>>>>>>>>>>> would strongly advocate to bit the bullet and do KIP to > >>>>>>>>> pro-actively > >>>>>>>>>>>>>>> add the `get` "everywhere" it's missing... But overall, it > >>>>>>>>>>>>>>> seems > >>>>>>>>> to be > >>>>>>>>>>>>>>> a much broader decision, and we should get buy in from many > >>>>>>>>> committers > >>>>>>>>>>>>>>> about it -- as long as there is no broad consensus to add > >>>>>>>>>>>>>>> `get` > >>>>>>>>>>>>>>> everywhere, I would strongly prefer not to diverge from the > >>>>>>>>> current > >>>>>>>>>>>>>>> agreement to omit `get`. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 10/4/23 2:36 AM, Bruno Cadonna wrote: > >>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regarding tombstones: > >>>>>>>>>>>>>>>> As far as I understand, we need to add either a validTo > >>>>>>>>>>>>>>>> field to > >>>>>>>>>>>>>>>> VersionedRecord or we need to return tombstones, > >>>>>>>>>>>>>>>> otherwise the > >>>>>>>>> result > >>>>>>>>>>>>>>>> is not complete, because users could never know a record > >>>>>>>>>>>>>>>> was > >>>>>>>>> deleted > >>>>>>>>>>>>>>>> at some point before the second non-null value was put. > >>>>>>>>>>>>>>>> I like more adding the validTo field since it makes the > >>>>>>>>>>>>>>>> result > >>>>>>>>> more > >>>>>>>>>>>>>>>> concise and easier interpretable. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Extending on Victoria's example, with the following puts > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> put(k, v1, time=0) > >>>>>>>>>>>>>>>> put(k, null, time=5) > >>>>>>>>>>>>>>>> put(k, null, time=10) > >>>>>>>>>>>>>>>> put(k, null, time=15) > >>>>>>>>>>>>>>>> put(k, v2, time=20) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> the result with tombstones would be > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> value, timestamp > >>>>>>>>>>>>>>>> (v1, 0) > >>>>>>>>>>>>>>>> (null, 5) > >>>>>>>>>>>>>>>> (null, 10) > >>>>>>>>>>>>>>>> (null, 15) > >>>>>>>>>>>>>>>> (v2, 20) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> instead of > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> value, timestamp, validTo > >>>>>>>>>>>>>>>> (v1, 0, 5) > >>>>>>>>>>>>>>>> (v2, 20, null) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The benefit of conciseness would already apply to one > >>>>>>>>>>>>>>>> single > >>>>>>>>>>>>>>>> tombstone. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On the other hand, why would somebody write consecutive > >>>>>>>>> tombstones > >>>>>>>>>>>>>>>> into a versioned state store? I guess if somebody does > >>>>>>>>>>>>>>>> that on > >>>>>>>>>>>>>>>> purpose, then there should be a way to retrieve each of > >>>>>>>>>>>>>>>> those > >>>>>>>>>>>>>>>> tombstones, right? > >>>>>>>>>>>>>>>> So maybe we need both -- validTo field and the option to > >>>>>>>>>>>>>>>> return > >>>>>>>>>>>>>>>> tombstones. The latter might be moved to a future KIP in > >>>>>>>>>>>>>>>> case we > >>>>>>>>> see > >>>>>>>>>>>>>>>> the need. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regarding .within(fromTs, toTs): > >>>>>>>>>>>>>>>> I would keep it simple with .from() and .asOfTimestamp() > >>>>>>>>>>>>>>>> (or > >>>>>>>>>>>>>>>> .until()). If we go with .within(), I would opt for > >>>>>>>>>>>>>>>> .withinTimeRange(fromTs, toTs), because the query > >>>>>>>>>>>>>>>> becomes more > >>>>>>>>>>>>> readable: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> MultiVersionedKeyQuery > >>>>>>>>>>>>>>>> .withKey(1) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> .withinTimeRange(Instant.parse(2023-08-03T10:37:30.00Z), > >>>>>>>>>>>>>>>> Instant.parse(2023-08-04T10:37:30.00Z)) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If we stay with .from() and .until(), we should consider > >>>>>>>>> .fromTime() > >>>>>>>>>>>>>>>> and .untilTime() (or .toTime()): > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> MultiVersionedKeyQuery > >>>>>>>>>>>>>>>> .withKey(1) > >>>>>>>>>>>>>>>> .fromTime(Instant.parse(2023-08-03T10:37:30.00Z)) > >>>>>>>>>>>>>>>> .untilTime(Instant.parse(2023-08-04T10:37:30.00Z)) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regarding asOf vs. until: > >>>>>>>>>>>>>>>> I think asOf() is more used in point in time queries as > >>>>>>>>>>>>>>>> Walker > >>>>>>>>>>>>>>>> mentioned where this KIP specifies a time range. IMO > asOf() > >>>>>>>>>>>>>>>> fits > >>>>>>>>> very > >>>>>>>>>>>>>>>> well with KIP-960 where one version is queried, but here I > >>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>> .until() fits better. That might just be a matter of > >>>>>>>>>>>>>>>> taste and > >>>>>>>>> in the > >>>>>>>>>>>>>>>> end I am fine with both as long as it is well documented. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regarding getters without "get": > >>>>>>>>>>>>>>>> In the other IQv2 classes we used getters with "get". In > >>>>>>>>> general, we > >>>>>>>>>>>>>>>> tend to move away from using getters without "get", > >>>>>>>>>>>>>>>> recently. So > >>>>>>>>> I > >>>>>>>>>>>>>>>> would keep the "get". > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 10/3/23 7:49 PM, Walker Carlson wrote: > >>>>>>>>>>>>>>>>> Hey Alieh thanks for the KIP, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Weighing in on the AsOf vs Until debate I think either is > >>>>>>>>>>>>>>>>> fine > >>>>>>>>>>>>>>>>> from a > >>>>>>>>>>>>>>>>> natural language perspective. Personally AsOf makes more > >>>>>>>>>>>>>>>>> sense > >>>>>>>>> to me > >>>>>>>>>>>>>>>>> where > >>>>>>>>>>>>>>>>> until gives me the idea that the query is making a > change. > >>>>>>>>>>>>>>>>> It's > >>>>>>>>>>>>>>>>> totally a > >>>>>>>>>>>>>>>>> connotative difference and not that important. I think as > >>>>>>>>>>>>>>>>> of is > >>>>>>>>>>>>>>>>> pretty > >>>>>>>>>>>>>>>>> frequently used in point of time queries. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Also for these methods it makes sense to drop the "get" > We > >>>>>> don't > >>>>>>>>>>>>>>>>> normally use that in getters > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> * The key that was specified for this query. > >>>>>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>>>> public K getKey(); > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> /** > >>>>>>>>>>>>>>>>> * The starting time point of the query, if > >>>>>>>>>>>>>>>>> specified > >>>>>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>>>> public Optional<Instant> getFromTimestamp(); > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> /** > >>>>>>>>>>>>>>>>> * The ending time point of the query, if > specified > >>>>>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>>>> public Optional<Instant> getAsOfTimestamp(); > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Other than that I didn't have too much to add. Overall > >>>>>>>>>>>>>>>>> I like > >>>>>>>>> the > >>>>>>>>>>>>>>>>> direction > >>>>>>>>>>>>>>>>> of the KIP and think the funcatinlyt is all there! > >>>>>>>>>>>>>>>>> best, > >>>>>>>>>>>>>>>>> Walker > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax < > >>>>>>>>> mj...@apache.org> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks for the updated KIP. Overall I like it. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Victoria raises a very good point, and I personally > >>>>>>>>>>>>>>>>>> tend to > >>>>>>>>>>>>>>>>>> prefer (I > >>>>>>>>>>>>>>>>>> believe so does Victoria, but it's not totally clear > from > >>>>>>>>>>>>>>>>>> her > >>>>>>>>>>>>>>>>>> email) if > >>>>>>>>>>>>>>>>>> a range query would not return any tombstones, ie, > >>>>>>>>>>>>>>>>>> only two > >>>>>>>>> records > >>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>> Victoria's example. Thus, it seems best to include a > >>>>>>>>>>>>>>>>>> `validTo` > >>>>>>>>>>>>>>>>>> ts-field > >>>>>>>>>>>>>>>>>> to `VersionedRecord` -- otherwise, the retrieved result > >>>>>>>>>>>>>>>>>> cannot > >>>>>>>>> be > >>>>>>>>>>>>>>>>>> interpreted correctly. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Not sure what others think about it. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I would also be open to actually add a > `includeDeletes()` > >>>>>>>>>>>>>>>>>> (or > >>>>>>>>>>>>>>>>>> `includeTombstones()`) method/flag (disabled by > >>>>>>>>>>>>>>>>>> default) to > >>>>>>>>> allow > >>>>>>>>>>>>>>>>>> users > >>>>>>>>>>>>>>>>>> to get all tombstone: this would only be helpful if > there > >>>>>>>>>>>>>>>>>> are > >>>>>>>>> two > >>>>>>>>>>>>>>>>>> consecutive tombstone though (if I got it right), so not > >>>>>>>>>>>>>>>>>> sure > >>>>>>>>> if we > >>>>>>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>>>> to add it or not -- it seems also possible to add it > >>>>>>>>>>>>>>>>>> later if > >>>>>>>>> there > >>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> user demand for it, so it might be a premature > >>>>>>>>>>>>>>>>>> addition as > >>>>>> this > >>>>>>>>>>>>> point? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Nit: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> the public interface ValueIterator is used > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> "is used" -> "is added" (otherwise it sounds like as if > >>>>>>>>>>>>>>>>>> `ValueIterator` > >>>>>>>>>>>>>>>>>> exist already) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Should we also add a `.within(fromTs, toTs)` (or maybe > >>>>>>>>>>>>>>>>>> some > >>>>>>>>> better > >>>>>>>>>>>>>>>>>> name?) to allow specifying both bounds at once? The > >>>>>>>>>>>>>>>>>> existing > >>>>>>>>>>>>>>>>>> `RangeQuery` does the same for specifying the > >>>>>>>>>>>>>>>>>> key-range, so > >>>>>>>>>>>>>>>>>> might be > >>>>>>>>>>>>>>>>>> good to add for time-range too? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 9/6/23 5:01 AM, Bruno Cadonna wrote: > >>>>>>>>>>>>>>>>>>> In my last e-mail I missed to finish a sentence. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "I think from a KIP" > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> should be > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "I think the KIP looks good!" > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On 9/6/23 1:59 PM, Bruno Cadonna wrote: > >>>>>>>>>>>>>>>>>>>> Hi Alieh, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks for the KIP! > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I think from a KIP > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 1. > >>>>>>>>>>>>>>>>>>>> I propose to throw an IllegalArgumentException or an > >>>>>>>>>>>>>>>>>>>> IllegalStateException for meaningless combinations. > >>>>>>>>>>>>>>>>>>>> In any > >>>>>>>>> case, > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> KIP should specify what exception is thrown. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 2. > >>>>>>>>>>>>>>>>>>>> Why does not specifying a range return the latest > >>>>>>>>>>>>>>>>>>>> version? I > >>>>>>>>>>>>>>>>>>>> would > >>>>>>>>>>>>>>>>>>>> expect that it returns all versions since an empty > >>>>>>>>>>>>>>>>>>>> lower or > >>>>>>>>> upper > >>>>>>>>>>>>>>>>>>>> limit is interpreted as no limit. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 3. > >>>>>>>>>>>>>>>>>>>> I second Matthias comment about replacing "asOf" with > >>>>>>>>> "until" or > >>>>>>>>>>>>>>>>>>>> "to". > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 4. > >>>>>>>>>>>>>>>>>>>> Do we need "allVersions()"? As I said above I would > >>>>>>>>>>>>>>>>>>>> return > >>>>>>>>> all > >>>>>>>>>>>>>>>>>>>> versions if no limits are specified. I think if we > >>>>>>>>>>>>>>>>>>>> get rid > >>>>>> of > >>>>>>>>>>>>>>>>>>>> allVersions() there might not be any meaningless > >>>>>> combinations > >>>>>>>>>>>>>>>>>>>> anymore. > >>>>>>>>>>>>>>>>>>>> If a user applies twice the same limit like for > example > >>>>>>>>>>>>>>>>>>>> MultiVersionedKeyQuery.with(key).from(t1).from(t2) the > >>>>>>>>>>>>>>>>>>>> last > >>>>>>>>> one > >>>>>>>>>>>>>>>>>>>> wins. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 5. > >>>>>>>>>>>>>>>>>>>> Could you add some more examples with time ranges to > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>> example > >>>>>>>>>>>>>>>>>> section? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 6. > >>>>>>>>>>>>>>>>>>>> The KIP misses the test plan section. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 7. > >>>>>>>>>>>>>>>>>>>> I propose to rename the class to > "MultiVersionKeyQuery" > >>>>>>>>> since we > >>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>>>> querying multiple versions of the same key. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 8. > >>>>>>>>>>>>>>>>>>>> Could you also add withAscendingTimestamps()? IMO it > >>>>>>>>>>>>>>>>>>>> gives > >>>>>>>>> users > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> possibility to make their code more readable instead > of > >>>>>>>>>>>>>>>>>>>> only > >>>>>>>>>>>>> relying > >>>>>>>>>>>>>>>>>>>> on the default. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Bruno > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On 8/17/23 4:13 AM, Matthias J. Sax wrote: > >>>>>>>>>>>>>>>>>>>>> Thanks for splitting this part into a separate KIP! > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> For `withKey()` we should be explicit that `null` > >>>>>>>>>>>>>>>>>>>>> is not > >>>>>>>>>>>>>>>>>>>>> allowed. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> (Looking into existing `KeyQuery` it seems the > >>>>>>>>>>>>>>>>>>>>> JavaDocs > >>>>>>>>> don't > >>>>>>>>>>>>> cover > >>>>>>>>>>>>>>>>>>>>> this either -- would you like to do a tiny cleanup > >>>>>>>>>>>>>>>>>>>>> PR for > >>>>>>>>>>>>>>>>>>>>> this, or > >>>>>>>>>>>>>>>>>>>>> fix on-the-side in one of your PRs?) > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The key query returns all the records that are > >>>>>>>>>>>>>>>>>>>>>> valid in > >>>>>> the > >>>>>>>>>>>>>>>>>>>>>> time > >>>>>>>>>>>>>>>>>>>>>> range starting from the timestamp {@code > >>>>>>>>>>>>>>>>>>>>>> fromTimestamp}. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> In the JavaDocs you use the phrase `are valid` -- I > >>>>>>>>>>>>>>>>>>>>> think > >>>>>> we > >>>>>>>>>>>>>>>>>>>>> need to > >>>>>>>>>>>>>>>>>>>>> explain what "valid" means? It might even be worth > >>>>>>>>>>>>>>>>>>>>> to add > >>>>>>>>> some > >>>>>>>>>>>>>>>>>>>>> examples. It's annoying, but being precise if kinda > >>>>>>>>> important. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> With regard to KIP-962, should we allow `null` for > >>>>>>>>>>>>>>>>>>>>> time > >>>>>>>>> bounds ? > >>>>>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>>>> JavaDocs should also be explicit if `null` is > >>>>>>>>>>>>>>>>>>>>> allowed or > >>>>>>>>> not and > >>>>>>>>>>>>>>>>>>>>> what > >>>>>>>>>>>>>>>>>>>>> the semantics are if allowed. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> You are using `asOf()` however, because we are doing > >>>>>>>>> time-range > >>>>>>>>>>>>>>>>>>>>> queries, to me using `until()` to describe the upper > >>>>>>>>>>>>>>>>>>>>> bound > >>>>>>>>> would > >>>>>>>>>>>>>>>>>>>>> sound better (I am not a native speaker though, so > >>>>>>>>>>>>>>>>>>>>> maybe I > >>>>>>>>> am > >>>>>>>>>>>>> off?) > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The key query returns all the records that have > >>>>>>>>>>>>>>>>>>>>>> timestamp > >>>>>>>>> <= > >>>>>>>>>>>>>>>>>>>>>> {@code > >>>>>>>>>>>>>>>>>>>>>> asOfTimestamp}. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> This is only correct if not lower-bound is set, > right? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> In your reply to KIP-960 you mentioned: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> the meaningless combinations are prevented by > >>>>>>>>>>>>>>>>>>>>>> throwing > >>>>>>>>>>>>> exceptions. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> We should add corresponding JavaDocs like: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> @throws IllegalArgumentException if {@code > >>>>>>>>>>>>>>>>>>>>> fromTimestamp} is > >>>>>>>>>>>>>>>>>>>>> equal or > >>>>>>>>>>>>>>>>>>>>> larger than > >>>>>>>>>>>>>>>>>>>>> {@code > >>>>>>>>>>>>>>>>>>>>> untilTimestamp} > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Or something similar. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> With regard to KIP-960: if we need to introduce a > >>>>>>>>>>>>>>>>>>>>> `VersionedKeyQuery` > >>>>>>>>>>>>>>>>>>>>> class for single-key-single-ts lookup, would we > >>>>>>>>>>>>>>>>>>>>> need to > >>>>>> find > >>>>>>>>>>>>>>>>>>>>> a new > >>>>>>>>>>>>>>>>>>>>> name for the query class of this KIP, given that the > >>>>>>>>>>>>>>>>>>>>> return > >>>>>>>>> type > >>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>> different? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 8/16/23 10:57 AM, Alieh Saeedi wrote: > >>>>>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I splitted KIP-960 > >>>>>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>> > >>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> into three separate KIPs. Therefore, please continue > >>>>>>>>>>>>>>>>>>>>>> discussions > >>>>>>>>>>>>>>>>>>>>>> about single-key, multi-timestamp interactive > queries > >>>>>>>>> here. You > >>>>>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>> see all > >>>>>>>>>>>>>>>>>>>>>> the addressed reviews on the following page. > >>>>>>>>>>>>>>>>>>>>>> Thanks in > >>>>>>>>> advance. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> KIP-968: Support single-key_multi-timestamp > >>>>>>>>>>>>>>>>>>>>>> interactive > >>>>>>>>> queries > >>>>>>>>>>>>>>>>>>>>>> (IQv2) for > >>>>>>>>>>>>>>>>>>>>>> versioned state stores > >>>>>>>>>>>>>>>>>>>>>> < > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>> > >>>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-968%3A+Support+single-key_multi-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I look forward to your feedback! > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>> Alieh > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >> >