Thank you all.
I decided to remove the ordering from the KIP and maybe move it to the
subsequent KIPs (based on user demand).
I skimmed over the discussion thread, but we still had an open question
about how a user can retrieve the `latest()` values. I think what Matthias
suggested (using `TimestampedRangeQuery`) can be the solution. What do you
think? Bests,
Alieh

On Wed, Dec 6, 2023 at 1:57 PM Lucas Brutschy
<lbruts...@confluent.io.invalid> wrote:

> Hi Alieh,
>
> I think we do not have to restrict ourselves too much for the future
> and complicate the implementation. The user can always store away and
> sort, so we should only provide the ordering guarantee we can provide
> efficiently, and we shouldn't restrict our future evolution too much
> by this. I think a global ordering by timestamp is sufficient for this
> KIP, so I vote for option 2.
>
> Cheers,
> Lucas
>
> On Fri, Dec 1, 2023 at 8:45 PM Alieh Saeedi
> <asae...@confluent.io.invalid> wrote:
> >
> > Hi all,
> > I updated the KIP based on the changes made in the former KIP (KIP-968).
> So
> > with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had
> some
> > changes both in the defined fields and defined methods.
> >
> > Based on the PoC PR, what we currently promise in the KIP about ordering
> > seems like a dream. I intended to enable the user to have a global
> ordering
> > based on the key or timestamp (using `orderByKey()` or
> > `orderByTimestamp()`) and then even have a partial ordering based on the
> > other parameter.  For example, if the user specifies
> > `orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
> > global ordering is based on keys in a descending order, and then all the
> > records with the same key are ordered ascendingly based on ts. The result
> > will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
> > (assuming that k1<k2<k3 and t1<t2<t3).
> >
> > But in reality, we have limitations for having a global ordering based on
> > keys since we are iterating over the segments in a lazy manner.
> Therefore,
> > when we are processing the current segment, we have no knowledge of the
> > keys in the next segment.
> >
> > Now I have two suggestions:
> > 1. Changing the `MultiVersionedRangeQuery` class as follows:
> >
> > private final ResultOrder *segmentOrder*;
> > private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
> > TIMESTAMP_WISE
> > private final ResultOrder  *keyOrder*;
> > private final ResultOrder *timestampOrder*;
> >
> > This way, the global ordering is specified by the `segmentOrder`. It
> means
> > we either show the results from the oldest to the latest segment
> > (ASCENDING) or from the latest to the oldest segment (DESCENDING).
> > Then, inside each segment, we guarantee a `segmentContentOrder` which can
> > be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order are
> > specified by the `keyOrder` and `timestampOrder` properties,
> respectively.
> > If the content of a segment must be ordered key-wise and then we have two
> > records with the same key (it happens in older segments), then the
> > `timestampOrder` determines the order between them.
> >
> > 2. We define that global ordering can only be based on timestamps (the
> > `timestampOrder` property), and if two records have the same timestamp,
> the
> > `keyOrder` determines the order between them.
> >
> > I think the first suggestion gives more flexibility to the user, but it
> is
> > more complicated. I mean, it needs good Javadocs.
> >
> > I look forward to your ideas.
> >
> > Cheers,
> > Alieh
> >
> >
> > On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <asae...@confluent.io>
> wrote:
> >
> > > Thank you, Bruno and Matthias.
> > > I updated the KIP as follows:
> > > 1. The one remaining `asOf` word in the KIP is removed.
> > > 2. Example 2 is updated. Thanks, Bruno for the correction.
> > >
> > > Discussions and open questions
> > > 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
> > > Because the results must have a global ordering. Either based on key or
> > > based on ts. For example, we can have
> > > `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
> > > global ordering is based on keys in a descending order, and then all
> the
> > > records with the same key are ordered ascendingly based on ts. The
> result
> > > will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
> > > (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
> > > 2. About having the `latest()` method: it seems like we are undecided
> yet.
> > > Adding a new class or ignoring `latest()` for VersionedRangeQuery and
> > > instead using the `TimestampedRangeQuery` as Matthias suggested.
> > >
> > > Cheers,
> > > Alieh
> > >
> > > On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> > >
> > >> Great discussion. Seems we are making good progress.
> > >>
> > >> I see advantages and disadvantages in splitting out a "single-ts
> > >> key-range" query type. I guess, the main question might be, what
> > >> use-cases do we anticipate and how common we believe they are?
> > >>
> > >> We should also take KIP-992 (adding `TimestampedRangeQuery`) into
> account.
> > >>
> > >> (1) The most common use case seems to be, a key-range over latest. For
> > >> this one, we could use `TimestampedRangeQuery` -- it would return a
> > >> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
> > >> be any information loss, because "toTimestamp" would be `null` anyway.
> > >>
> > >>
> > >> (2) The open question is, how common is a key-range in a point in the
> > >> past? For this case, using
> > >> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
> > >> actually not to be a bad UX, and also does not really need to be
> > >> explained how to do this (compared to "latest" that required to pass
> in
> > >> MAX_INT).
> > >>
> > >>
> > >> If we add a new query type, we avoid both issues (are they actually
> > >> issues?) and add some nice syntactic sugar to the API. The question
> is,
> > >> if it's worth the effort and expanded API surface area?
> > >>
> > >> To summarize:
> > >>
> > >> Add new query type:
> > >>
> > >> > // queries latest; returns VersionedRecords
> > >> > VersionedRangeQuery.withKeyRange(...);
> > >> >
> > >> > VersionedRangeQuery.withKeyRange(...).asOf(ts);
> > >>
> > >> vs
> > >>
> > >> No new query type:
> > >>
> > >> > // queries latest; returns ValueAndTimestamps
> > >> > TimestampedRangeQuery.withRange(...);
> > >> >
> > >> > MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
> > >>
> > >>
> > >>
> > >> I guess, bottom line, I would be ok with either one and I am actually
> > >> not even sure which one I prefer personally. Just wanted to lay out
> the
> > >> tradeoffs I see. Not sure if three are other considerations that would
> > >> tip the scale into either direction?
> > >>
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
> > >> > Hi Alieh,
> > >> >
> > >> > I like the examples!
> > >> >
> > >> >
> > >> > 1.
> > >> > Some terms like `asOf` in the descriptions still need to be
> replaced in
> > >> > the KIP.
> > >> >
> > >> >
> > >> > 2.
> > >> > In your last e-mail you state:
> > >> >
> > >> > "How can a user retrieve the latest value? We have the same issue
> with
> > >> > kip-968 as well."
> > >> >
> > >> > Why do we have the same issue in KIP-968?
> > >> > If users need to retrieve the latest value for a specific key, they
> > >> > should use KIP-960.
> > >> >
> > >> >
> > >> > 3.
> > >> > Regarding querying the latest version (or an asOf version) of
> records
> > >> in
> > >> > a given key range, that is exactly why I proposed to split the query
> > >> > class. One class would return the latest and the asOf versions
> (i.e. a
> > >> > single version) of records in a key range and the other class would
> > >> > return all versions in a given time range (i.e. multiple versions)
> of
> > >> > the records in a given key range. The splitting in two classes
> avoids
> > >> to
> > >> > specify a time range and latest or a time range and asOf on a given
> key
> > >> > range.
> > >> >
> > >> > Alternatively, you could keep one class and you could specify that
> the
> > >> > last call wins as you specified for fromTime() and toTime(). For
> > >> > example, for
> > >> >
> > >> >
> > >>
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
> > >> >
> > >> > latest() wins. However, how would you interpret
> > >> >
> > >> >
> > >>
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
> > >> >
> > >> > Is it [t1, t2] or [-INF, t2]?
> > >> > (I would say the latter, but somebody else would say differently)
> > >> >
> > >> > The two class solution seems cleaner to me since we do not need to
> > >> > consider those edge cases.
> > >> > You could propose both classes in this KIP.
> > >> >
> > >> >
> > >> > 4.
> > >> > Why do we need orderByKey() and orderByTimestamp()?
> > >> > Aren't withAscendingKeys(), withDescendingKeys(),
> > >> > withAscendingTimestamps(), and withDescendingTimestamps()
> sufficient?
> > >> >
> > >> >
> > >> > 5.
> > >> > In example 2, why is
> > >> >
> > >> > key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
> > >> > 2023-01-25T10:00:00.00Z
> > >> >
> > >> > not part of the result?
> > >> > It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
> > >> > which overlaps with the time range [2023-01-17T10:00:00.00Z,
> > >> > 2023-01-30T10:00:00.00Z] of the query.
> > >> >
> > >> > (BTW, in the second example, you forgot to add "key" to the output.)
> > >> >
> > >> >
> > >> > Best,
> > >> > Bruno
> > >> >
> > >> >
> > >> > On 10/25/23 4:01 PM, Alieh Saeedi wrote:
> > >> >> Thanks, Matthias and Bruno.
> > >> >> Here is a list of updates:
> > >> >>
> > >> >>     1. I changed the variable and method names as I did for
> KIP-968, as
> > >> >>     follows:
> > >> >>        - "fromTimestamp" -> fromTime
> > >> >>        - "asOfTimestamp"-> toTime
> > >> >>        - "from(instant)" -> fromTime(instant)"
> > >> >>        - asOf(instant)"->toTime(instant)
> > >> >>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
> > >> >>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods
> for
> > >> >> user
> > >> >>     readability.
> > >> >>     3. I updated the "Example" section as well.
> > >> >>
> > >> >> Some points:
> > >> >>
> > >> >>     1. Even though the kip suggests adding the `get(k
> lowerkeybound, k
> > >> >>     upperkeybound, long fromtime, long totime)` method to the
> > >> >> interface, I
> > >> >>     added this method to the `rocksdbversionedstore` class for now.
> > >> >>     2. Matthias, you mentioned a very important point. How can a
> user
> > >> >>     retrieve the latest value? We have the same issue with kip-968
> as
> > >> >> well.
> > >> >>     Asking a user to call `toTime(max)` violates the API design
> rules,
> > >> >> as you
> > >> >>     mentioned. So I think we must have a `latest()` method for both
> > >> >> KIP-968 and
> > >> >>     KIP-969. What do you think about that?
> > >> >>
> > >> >>
> > >> >> Cheers,
> > >> >> Alieh
> > >> >>
> > >> >> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
> > >> wrote:
> > >> >>
> > >> >>> Thanks for the update.
> > >> >>>
> > >> >>>
> > >> >>>> To retrieve
> > >> >>>>>      the latest value(s), the user must call just the asOf
> method
> > >> with
> > >> >>> the MAX
> > >> >>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you
> think
> > >> >>>>> it is
> > >> >>> clumsy,
> > >> >>>>>      Matthias?
> > >> >>>
> > >> >>>
> > >> >>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
> > >> optional,
> > >> >>> and default is "latest", right? So while `asOf(MAX)` does the same
> > >> >>> thing, practically users would never call `asOf` for a "latest"
> query?
> > >> >>>
> > >> >>> In this KIP, we enforce that users give us a key range (we have
> the 4
> > >> >>> static entry point methods to define a query for this), and we
> say we
> > >> >>> default to "no bounds" for time range by default.
> > >> >>>
> > >> >>> The existing `RangeQuery` allows to query a range of keys for
> existing
> > >> >>> stores. It seems to be a common pattern to query a key-range on
> > >> latest.
> > >> >>> -- in the current proposal, users would need to do:
> > >> >>>
> > >> >>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
> > >> >>>
> > >> >>> Would like to hear from others if we think that's good user
> > >> experience?
> > >> >>> If we agree to accept this, I think we should explain how to do
> this
> > >> in
> > >> >>> the JavaDocs (and also regular docs... --- otherwise, I can
> already
> > >> >>> anticipate user question on all question-asking-channels how to
> do a
> > >> >>> "normal key range query". IMHO, the problem is not that the code
> > >> itself
> > >> >>> it too clumsy, but that it's totally not obvious to uses how to
> > >> express
> > >> >>> it without actually explaining it to them. It basically violated
> the
> > >> API
> > >> >>> design rule "make it easy to use / simple things should be easy".
> > >> >>>
> > >> >>> Btw: We could also re-use `RangeQuery` and add am implementation
> to
> > >> >>> `VersionedStateStore` to just accept this query type, with "key
> range
> > >> >>> over latest" semantics. -- The issue is of course, that uses need
> to
> > >> >>> know that the query would return `ValueAndTimestamp` and not
> plain `V`
> > >> >>> (or we add a translation step to unwrap the value, but we would
> lose
> > >> the
> > >> >>> "validFrom" timestamp -- validTo would be `null`). Because type
> safety
> > >> >>> is a general issue in IQv2 it would not make it worse (in the
> strict
> > >> >>> sense), but I am also not sure if we want to dig an even deeper
> > >> hole...
> > >> >>>
> > >> >>>
> > >> >>> -Matthias
> > >> >>>
> > >> >>>
> > >> >>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> > >> >>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is
> a
> > >> >>> summary
> > >> >>>> of the updates I made to the KIP:
> > >> >>>>
> > >> >>>>      1.  I liked the idea of renaming methods as Matthias
> suggested.
> > >> >>>>      2. I removed the allversions() method as I did in KIP-968.
> To
> > >> >>> retrieve
> > >> >>>>      the latest value(s), the user must call just the asOf method
> > >> with
> > >> >>> the MAX
> > >> >>>>      value (asOf(MAX)). The same applies to KIP-968. Do you
> think it
> > >> is
> > >> >>> clumsy,
> > >> >>>>      Matthias?
> > >> >>>>      3. I added a method to the *VersionedKeyValueStore
> *interface,
> > >> >>>> as I
> > >> >>> did
> > >> >>>>      for KIP-968.
> > >> >>>>      4. Matthias: I do not get what you mean by your second
> comment.
> > >> >>>> Isn't
> > >> >>>>      the KIP already explicit about that?
> > >> >>>>
> > >> >>>>      > I assume, results are returned by timestamp for each key.
> The
> > >> >>>> KIP
> > >> >>>>      should be explicit about it.
> > >> >>>>
> > >> >>>>
> > >> >>>> Cheers,
> > >> >>>> Alieh
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org
> >
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>> Thanks for updating the KIP.
> > >> >>>>>
> > >> >>>>> Not sure if I agree or not with Bruno's idea to split the query
> > >> types
> > >> >>>>> further? In the end, we split them only because there is three
> > >> >>>>> different
> > >> >>>>> return types: single value, value-iterator, key-value-iterator.
> > >> >>>>>
> > >> >>>>> What do we gain by splitting out single-ts-range-key? In the
> end,
> > >> for
> > >> >>>>> range-ts-range-key the proposed class is necessary and is a
> superset
> > >> >>>>> (one can set both timestamps to the same value, for single-ts
> > >> lookup).
> > >> >>>>>
> > >> >>>>> The mentioned simplification might apply to
> "single-ts-range-key"
> > >> >>>>> but I
> > >> >>>>> don't see a simplification for the proposed (and necessary)
> query
> > >> >>>>> type?
> > >> >>>>>
> > >> >>>>> On the other hand, I see an advantage of a single-ts-range-key
> for
> > >> >>>>> querying over the "latest version" with a range of keys. For a
> > >> >>>>> single-ts-range-key query, this it would be the default
> (similar to
> > >> >>>>> VersionedKeyQuery with not asOf-timestamped defined).
> > >> >>>>>
> > >> >>>>> In the current version of the KIP, (if we agree that default
> should
> > >> >>>>> actually return "all versions" not "latest" -- this default was
> > >> >>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
> > >> >>>>> need to
> > >> >>>>> have the same default here to stay consistent), users would
> need to
> > >> >>>>> pass
> > >> >>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query
> the
> > >> >>>>> latest point in time only, what seems to be clumsy? Or we could
> add
> > >> a
> > >> >>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it
> does
> > >> >>>>> seems
> > >> >>>>> a little clumsy, too.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>> The overall order of the returned records is by Key
> > >> >>>>>
> > >> >>>>> I assume, results are returned by timestamp for each key. The
> KIP
> > >> >>>>> should
> > >> >>>>> be explicit about it.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> To be very explicit, should we rename the methods to specify
> the key
> > >> >>> bound?
> > >> >>>>>
> > >> >>>>>     - withRange -> withKeyRange
> > >> >>>>>     - withLowerBound -> withLowerKeyBound
> > >> >>>>>     - withUpperBound -> withUpperKeyBound
> > >> >>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
> > >> >>>>> `allVersions` and not `noTimeBound` and should align the
> naming?)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> -Matthias
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> > >> >>>>>> Hi Alieh,
> > >> >>>>>>
> > >> >>>>>> Thanks for the KIP!
> > >> >>>>>>
> > >> >>>>>> One high level comment/question:
> > >> >>>>>>
> > >> >>>>>> I assume you separated single key queries into two classes
> because
> > >> >>>>>> versioned key queries return a single value and multi version
> key
> > >> >>>>>> queries return iterators. Although, range queries always return
> > >> >>>>>> iterators, it would make sense to also separate range queries
> for
> > >> >>>>>> versioned state stores into range queries that return one
> single
> > >> >>> version
> > >> >>>>>> of the keys within a range and range queries that return
> multiple
> > >> >>>>>> version of the keys within a range, IMO. That would reduce the
> > >> >>>>>> meaningless combinations.
> > >> >>>>>> WDYT?
> > >> >>>>>>
> > >> >>>>>> Best,
> > >> >>>>>> Bruno
> > >> >>>>>>
> > >> >>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> > >> >>>>>>> Hi all,
> > >> >>>>>>>
> > >> >>>>>>> I splitted KIP-960
> > >> >>>>>>> <
> > >> >>>>>
> > >> >>>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
> > >> >>>>>>
> > >> >>>>>>> into
> > >> >>>>>>> three separate KIPs. Therefore, please continue discussions
> about
> > >> >>> range
> > >> >>>>>>> interactive queries here. You can see all the addressed
> reviews
> > >> >>>>>>> on the
> > >> >>>>>>> following page. Thanks in advance.
> > >> >>>>>>>
> > >> >>>>>>> KIP-969: Support range interactive queries (IQv2) for
> versioned
> > >> >>>>>>> state
> > >> >>>>>>> stores
> > >> >>>>>>> <
> > >> >>>>>
> > >> >>>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
> > >> >>>>>>
> > >> >>>>>>>
> > >> >>>>>>> I look forward to your feedback!
> > >> >>>>>>>
> > >> >>>>>>> Cheers,
> > >> >>>>>>> Alieh
> > >> >>>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >
>

Reply via email to