Re: [DISCUSS] KIP-889 Versioned State Stores

Sagar Thu, 01 Dec 2022 10:15:59 -0800

Thanks Victoria,

I guess an advantage of exposing a method like delete(key, timestamp) could
be that from a user's standpoint, it is a single operation and not 2. The
equivalent of this method i.e put followed by get is not atomic so exposing
it certainly sounds like a good idea.


Thanks!
Sagar.

On Tue, Nov 29, 2022 at 1:15 AM Victoria Xia
<victoria....@confluent.io.invalid> wrote:

> Thanks, Sagar and Bruno, for your insights and comments!
>
> > Sagar: Can we name according to the semantics that you want to
> support like `getAsOf` or something like that? I am not sure if we do that
> in our codebase though. Maybe the experts can chime in.
>
> Because it is a new method that will be added, we should be able to name it
> whatever we like. I agree `getAsOf` is more clear, albeit wordier.
> Introducing `getAsOf(key, timestamp)` means we could leave open `get(key,
> timeFrom, timeTo)` to have an exclusive `timeTo` without introducing a
> collision. (We could introduce `getBetween(key, timeFrom, timeTo)` instead
> to delineate even more clearly, though this is better left for a future
> KIP.)
>
> I don't think there's any existing precedent in codebase to follow here but
> I'll leave that to the experts. Curious to hear what others prefer as well.
>
> > Sagar: With delete, we would stlll keep the older versions of the key
> right?
>
> We could certainly choose this for the semantics of delete(...) -- and it
> sounds like we should too, based on Bruno's confirmation below that this
> feels more natural to him as well -- but as Bruno noted in his message
> below I think we'll want the method signature to be `delete(key,
> timestamp)` then, so that there is an explicit timestamp to associate with
> the deletion. In other words, `delete(key, timestamp)` has the same effect
> as `put(key, null, timestamp)`. The only difference is that the `put(...)`
> method has a `void` return type, while `delete(key, timestamp)` can have
> `ValueAndTimestamp` as return type in order to return the record which is
> replaced (if any). In other words, `delete(key, timestamp)` is equivalent
> to `put(key, null, timestamp)` followed by `get(key, timestamp)`.
>
> > Bruno: I would also not change the semantics so that it deletes all
> versions of
> a key. I would rather add a new method purge(key) or
> deleteAllVersions(key) or similar if we want to have such a method in
> this first KIP.
>
> Makes sense; I'm convinced. Let's defer
> `purge(key)`/`deleteAllVersions(key)` to a future KIP. If there's agreement
> that `delete(key, timestamp)` (as described above) is valuable, we can keep
> it in this first KIP even though it is syntactic sugar. If this turns into
> a larger discussion, we can defer this to a future KIP as well.
>
> > Bruno: I would treat the history retention as a strict limit. [...] You
> could also add historyRetentionMs() to the VersionedKeyValueStore<K, V>
> interface to make the concept of the history retention part of the
> interface.
>
> OK. That's the second vote for rewording the javadoc for
> `VersionedKeyValueStore#get(key, timestampTo)` to remove the parenthetical
> and clarify that history retention should be used to dictate this case, so
> I'll go ahead and do that. I'll leave out adding `historyRetentionMs()` to
> the interface for now, though, for the sake of consistency with other
> stores (e.g., window stores) which don't expose similar types of
> configurations from their interfaces.
>
> > Bruno: exclusive vs inclusive regarding validTo timestamp in get().
> Doesn't this decision depend on the semantics of the join for which this
> state store should be used?
>
> Yes, you are correct. As a user I would expect that a stream-side record
> with the same timestamp as a table-side record _would_ produce a join
> result, which is consistent with the proposal for timestampTo to be
> inclusive. (FWIW I tried this out with a Flink temporal join just now and
> observed this result as well. Not sure where to look for other standards to
> validate this expectation.)
>
> > Bruno: If Streams does not update min.compaction.lag.ms during
> rebalances,
> users have to do it each time they change history retention in the code,
> right? That seems odd to me. What is the actual reason for not updating
> the config? How does Streams handle updates to windowed stores?
>
> Yes, users will have to update min.compaction.lag.ms for the changelog
> topic themselves if they update history retention in their code. This is
> consistent with what happens for window stores today: e.g., if a user
> updates grace period for a windowed aggregation, then they are responsible
> for updating retention.ms on their windowed changelog topic as well.
>
> I'm not familiar with the historical context around why this is the case --
> Matthias, do you know?
>
> My best guess is that Streams does not want to interfere with any potential
> out-of-band changes by the user between application restarts, though I'm
> not sure why a user would want to change this specific config to a value
> which does not accord with the specified history retention. I notice that
> there is code for validating topic configs and collecting validation errors
> (
>
> https://github.com/apache/kafka/blob/be032735b39360df1a6de1a7feea8b4336e5bcc0/streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java#L318-L319
> )
> but this method is not called from anywhere, even though there are unit
> tests for it. I was unable to find history of this validation after a quick
> search. Hopefully Matthias (or others) has context, otherwise I will have a
> closer look.
>
> - Victoria
>
> On Wed, Nov 23, 2022 at 8:52 AM Bruno Cadonna <cado...@apache.org> wrote:
>
> > Hi all,
> >
> > Thanks for the KIP, Victoria!
> >
> > I have a couple of comments.
> >
> > 1. delete(key)
> > I think delete(key) should not remove all versions of a key. We should
> > use it to close the validity interval of the last version.
> > Assuming we have records of different versions for key A:
> > (A, e, 0, 2),
> > (A, f, 2, 3),
> > (A, g, 3, MAX)
> >
> > delete(A) would update them to
> >
> > (A, e, 0, 2),
> > (A, f, 2, 3),
> > (A, g, 3, 5)
> > (A, null, 5, MAX)
> >
> > But then the question arises where does timestamp 5 that closes the
> > interval in (A, g, 3, 5) and opens the interval in (A, null, 5, MAX)
> > come from. We could use the timestamp at which delete(A) is called, but
> > actually I do not like that because it seems to me it opens the doors to
> > non-determinism. If we use event time for put() we should also use it
> > for delete(). Actually, put(A, null, 5) would have the same effect as
> > delete(A) in the example above. As a syntactical sugar, we could add
> > delete(key, validFrom). (I just realized now that I just repeated what
> > Victoria said in her previous e-mail.)
> > I agree with Victoria that delete(A) as defined for other state stores
> > is hard to re-use in the versioned key-value store.
> > I would also not change the semantics so that it deletes all versions of
> > a key. I would rather add a new method purge(key) or
> > deleteAllVersions(key) or similar if we want to have such a method in
> > this first KIP.
> >
> >
> > 2. history retention
> > I would remove "(up to store implementation discretion when this is the
> > case)". I would treat the history retention as a strict limit. If users
> > want to implement a less strict behavior, they can still do it. Maybe
> > mention in the javadocs the implications of not adhering strictly to the
> > history retention. That is, the DSL might become non-deterministic. You
> > could also add historyRetentionMs() to the VersionedKeyValueStore<K, V>
> > interface to make the concept of the history retention part of the
> > interface.
> >
> > 3. null vs. exception for out-of-bound queries
> > I am in favor of null. The record version is not there anymore because
> > it expired. This seems to me normal and nothing exceptional. That would
> > also consistent with the behavior of other APIs as already mentioned.
> >
> >
> > 4. Exposing segmentInterval
> > Since we have evidence that the segment interval affects performance, I
> > would expose it. But I find it also OK to expose it once we have a
> > corresponding metric.
> >
> > 5. exclusive vs inclusive regarding validTo timestamp in get()
> > Doesn't this decision depend on the semantics of the join for which this
> > state store should be used? Should a record on the table side that has
> > the same timestamp as the record on the stream side join? Or should only
> > records in the table that are strictly before the record on the stream
> > side join?
> >
> >
> > 6. Not setting min.compaction.lag.ms during rebalances
> > If Streams does not update min.compaction.lag.ms during rebalances,
> > users have to do it each time they change history retention in the code,
> > right? That seems odd to me. What is the actual reason for not updating
> > the config? How does Streams handle updates to windowed stores? That
> > should be a similar situation for the retention time config of the
> > changelog topic.
> >
> >
> > Best,
> > Bruno
> >
> >
> >
> > On 23.11.22 09:11, Sagar wrote:
> > > Hi Vicky,
> > >
> > > Thanks for your response!
> > >
> > > I would just use numbers to refer to your comments.
> > >
> > > 1) Thanks for your response. Even I am not totally sure whether these
> > > should be supported via IQv2 or via store interface. That said, I
> > wouldn't
> > > definitely qualify this as  blocking the KIP for sure so we can live
> > > without it :)
> > >
> > > 2) Yeah if the 2 APIs for get have different semantics for timestampTo,
> > > then it could be confusing. I went through the link for temporal tables
> > > (TFS!) and I now get why the AS OF semantics would have it inclusive. I
> > > think part of the problem is that the name get on it's own is not as
> > > expressive as SQL. Can we name according to the semantics that you want
> > to
> > > support like `getAsOf` or something like that? I am not sure if we do
> > that
> > > in our codebase though. Maybe the experts can chime in.
> > >
> > > 3) hmm I would have named it `validUpto` But again not very picky about
> > it.
> > > After going through the link and your KIP, it's a lot clearer to me.
> > >
> > > 4) I think delete(key) should be sufficient. With delete, we would
> > > stlll keep the older versions of the key right?
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, Nov 23, 2022 at 12:17 AM Victoria Xia
> > > <victoria....@confluent.io.invalid> wrote:
> > >
> > >> Thanks, Matthias and Sagar, for your comments! I've responded here for
> > now,
> > >> and will update the KIP afterwards with the outcome of our discussions
> > as
> > >> they resolve.
> > >>
> > >> ----------- Matthias's comments -----------
> > >>
> > >>> (1) Why does the new store not extend KeyValueStore, but StateStore?
> > >> In the end, it's a KeyValueStore?
> > >>
> > >> A `VersionedKeyValueStore<K, V>` is not a `KeyValueStore<K, V>`
> because
> > >> many of the KeyValueStore methods would not make sense for a versioned
> > >> store. For example, `put(K key, V value)` is not meaningful for a
> > versioned
> > >> store because the record needs a timestamp associated with it.
> > >>
> > >> A `VersionedKeyValueStore<K, V>` is more similar to a
> `KeyValueStore<K,
> > >> ValueAndTimestamp<V>>` (i.e., `TimestampedKeyValueStore<K, V>`), but
> > some
> > >> of the TimestampedKeyValueStore methods are still problematic. For
> > example,
> > >> what does it mean for `delete(K key)` to have return type
> > >> `ValueAndTimestamp<V>`? Does this mean that `delete(K key)` only
> deletes
> > >> (and returns) the latest record version for the key? Probably we want
> a
> > >> versioned store to have `delete(K key)` delete all record versions for
> > the
> > >> given key, in which case the return type is better suited as an
> > >> iterator/collection of KeyValueTimestamp. `putIfAbsent(K key,
> > >> ValueAndTimestamp value)` also has ambiguous semantics for versioned
> > stores
> > >> (i.e., what does it mean for the key/record to be "absent").
> > >>
> > >> I agree that conceptually a versioned key-value store is just a
> > key-value
> > >> store, though. In the future if we redesign the store interfaces, it'd
> > be
> > >> great to unify them by having a more generic KeyValueStore interface
> > that
> > >> allows for extra flexibility to support different types of key-value
> > >> stores, including versioned stores. (Or, if you can think of a way to
> > >> achieve this with the existing interfaces today, I'm all ears!)
> > >>
> > >>> (2) Should we have a ReadOnlyVersionedKeyValueStore? Even if we don't
> > >> want to support IQ in this KIP, it might be good to add this interface
> > >> right away to avoid complications for follow up KIPs? Or won't there
> by
> > >> any complications anyway?
> > >>
> > >> I don't think there will be complications for refactoring to add this
> > >> interface in the future. Refactoring out
> ReadOnlyVersionedKeyValueStore
> > >> from VersionedKeyValueStore would leave VersionedKeyValueStore
> unchanged
> > >> from the outside.
> > >>
> > >> Also, is it true that the ReadOnlyKeyValueStore interface is only used
> > for
> > >> IQv1 and not IQv2? I think it's an open question as to whether we
> should
> > >> support IQv1 for versioned stores or only IQv2. If the latter, then
> > maybe
> > >> we won't need the extra interface at all.
> > >>
> > >>> (3) Why do we not have a `delete(key)` method? I am ok with not
> > >> supporting all methods from existing KV-store, but a `delete(key)`
> seems
> > >> to be fundamentally to have?
> > >>
> > >> What do you think the semantics of `delete(key)` should be for
> versioned
> > >> stores? Should `delete(key)` delete (and return) all record versions
> for
> > >> the key? Or should we have `delete(key, timestamp)` which is
> equivalent
> > to
> > >> `put(key, null, timestamp)` except with a return type to return
> > >> ValueAndTimestamp representing the record it replaced?
> > >>
> > >> If we have ready alignment on what the interface and semantics for
> > >> `delete(key)` should be, then adding it in this KIP sounds good. I
> just
> > >> didn't want the rest of the KIP to be hung up over additional
> > interfaces,
> > >> given that we can always add extra interfaces in the future.
> > >>
> > >>> (4a) Do we need `get(key)`? It seems to be the same as `get(key,
> > >> MAX_VALUE)`? Maybe is good to have as syntactic sugar though? Just for
> > >> my own clarification (should we add something to the JavaDocs?).
> > >>
> > >> Correct, it is just syntactic sugar. I will add a clarification into
> the
> > >> Javadocs as you've suggested.
> > >>
> > >>> (4b) Should we throw an exception if a user queries out-of-bound
> > >> instead of returning `null` (in `get(key,ts)`)?
> > >>     -> You put it into "rejected alternatives", and I understand your
> > >> argument. Would love to get input from others about this question
> > >> though. -- It seems we also return `null` for windowed stores, so
> maybe
> > >> the strongest argument is to align to existing behavior? Or do we have
> > >> case for which the current behavior is problematic?
> > >>
> > >> Sure; curious to hear what others think as well.
> > >>
> > >>> (4c) JavaDoc on `get(key,ts)` says: "(up to store implementation
> > >> discretion when this is the case)" -> Should we make it a stricter
> > >> contract such that the user can reason about it better (there is WIP
> to
> > >> make retention time a strict bound for windowed stores atm)
> > >>     -> JavaDocs on `persistentVersionedKeyValueStore` seems to
> suggest a
> > >> strict bound, too.
> > >>
> > >> Ah, great question. I think the question boils down to: do we want to
> > >> require that all versioned stores (including custom user
> > implementations)
> > >> use "history retention" to determine when to expire old record
> versions?
> > >>
> > >> Because the `persistentVersionedKeyValueStore(...)` method returns
> > >> instances of the provided RocksDB-based versioned store
> implementation,
> > >> which does use history retention for this purpose, that's why we can
> > very
> > >> clearly say that for this store, `get(key, ts)` will return null if
> the
> > >> provided timestamp bound has fallen out of history retention. The
> > reason I
> > >> left the `VersionedKeyValueStore#get(key, ts)` Javadoc more generic
> > (i.e.,
> > >> does not mention history retention) is because maybe a user
> implementing
> > >> their own custom store will choose a different expiry mechanism, e.g.,
> > keep
> > >> the three latest versions for each key regardless of how old the
> > timestamps
> > >> are.
> > >>
> > >> If we want to require that all versioned stores use history retention
> in
> > >> order to determine when to expire old records, then I will certainly
> > update
> > >> the Javadoc to clarify. This is already a requirement for DSL users
> > because
> > >> the VersionedBytesStoreSupplier interface requires history retention
> to
> > be
> > >> provided (in order for changelog topic configs to be properly set), so
> > it's
> > >> just a question of whether we also want to require PAPI users to use
> > >> history retention too. I had a look at the existing window stores and
> > >> didn't see precedent for requiring all window stores have a standard
> > >> "retention time" concept for how long to keep windows, but if we want
> to
> > >> have a standard "history retention" concept for versioned stores we
> > >> certainly can. WDYT?
> > >>
> > >>> (5a) Do we need to expose `segmentInterval`? For windowed-stores, we
> > >> also use segments but hard-code it to two (it was exposed in earlier
> > >> versions but it seems not useful, even if we would be open to expose
> it
> > >> again if there is user demand).
> > >>
> > >> If we want to leave it out of this first KIP (and potentially expose
> it
> > in
> > >> the future), that works for me. The performance benchmarks I ran
> suggest
> > >> that this parameter greatly impacts store performance though and is
> very
> > >> workload dependent. If a user reported poor performance using
> versioned
> > >> stores for their workload, this is the first parameter I would want to
> > >> tune. That said, metrics/observability for versioned stores (which
> > would be
> > >> helpful for determining how this parameter should be adjusted) have
> been
> > >> deferred to a follow-up KIP, so perhaps that's reason to defer
> exposing
> > >> this parameter as well.
> > >>
> > >>> (5b) JavaDocs says: "Performance degrades as more record versions for
> > >> the same key are collected in a single segment. On the other hand,
> > >> out-of-order writes and reads which access older segments may slow
> down
> > >> if there are too many segments." -- Wondering if JavaDocs should make
> > >> any statements about expected performance? Seems to be an
> implementation
> > >> detail?
> > >>
> > >> I included this sentence to explain why a user might want to tune this
> > >> value / help guide how to think about the parameter, but if we want to
> > >> remove it entirely (per the discussion point above) then this Javadoc
> > will
> > >> be removed with it.
> > >>
> > >>> (6) validTo timestamp is "exclusive", right? Ie, if I query
> > >> `get(key,ts[=validToV1])` I would get `null` or the "next" record v2
> > >> with validFromV2=ts?
> > >>
> > >> I actually intended for it to be inclusive (will update the KIP). Do
> you
> > >> think exclusive is more intuitive? The reason I had inclusive in my
> > mind is
> > >> because it's like a "AS OF <time>" query, which treats the time bound
> as
> > >> inclusive.
> > >>
> > >>> (7) The KIP says, that segments are stores in the same RocksDB -- for
> > >> this case, how are efficient deletes handled? For windowed-store, we
> can
> > >> just delete a full RocksDB.
> > >>
> > >> The way that multiple segments are represented in the same RocksDB is
> > that
> > >> the RocksDB keys are prefixed with segment ID. An entire segment is
> > deleted
> > >> with a single `deleteRange()` call to RocksDB.
> > >>
> > >>> (8) Rejected alternatives: you propose to not return the validTo
> > >> timestamp -- if we find it useful in the future to return it, would
> > >> there be a clean path to change it accordingly?
> > >>
> > >> With the current proposal, there's no clean path. If we think there's
> a
> > >> good chance we might want to do this in the future, then we should
> > update
> > >> the proposed interfaces.
> > >>
> > >> The current proposed return type from `VersionedKeyValueStore<K,
> > >> V>#get(key, tsTo)` is `ValueAndTimestamp<V>`. There's no way to add a
> > >> second timestamp into `ValueAndTimestamp<V>`, which is why there's no
> > clean
> > >> path to include validTo timestamp in the future under the existing
> > >> proposal.
> > >>
> > >> If we wanted to allow for including validTo timestamp in the future,
> > we'd
> > >> instead update the return type to be a new `VersionedRecord<V>`
> object.
> > >> Today a `VersionedRecord<V>` could just include `value` and
> `timestamp`,
> > >> and in the future we could add `validTo` (names subject to change)
> into
> > the
> > >> `VersionedRecord` as well. (It'd look a little strange for now since
> > >> VersionedRecord is the same as ValueAndTimestamp, but that seems
> fine.)
> > >>
> > >> If we choose to do this, I think we should also update the return type
> > of
> > >> `VersionedKeyValueStore#get(key)` to be VersionedRecord as well,
> rather
> > >> than having one return TimestampAndValue while the other returns
> > >> VersionedRecord.
> > >>
> > >> ----------- Sagar's comments -----------
> > >>
> > >>> 1) Did you consider adding a method similar to :
> > >> List<ValueAndTimeStamp<V>> get(K key, long from, long to)?
> > >> I think this could be useful considering that this
> > >> versioning scheme unlocks time travel at a key basis. WDYT?
> > >>
> > >> Yes, I do think this method is valuable. I think we will definitely
> > want to
> > >> support time-range based queries at some point (hopefully soon), and
> > likely
> > >> also key-range based queries (to achieve feature parity with existing
> > >> key-value stores).
> > >>
> > >> It's not immediately clear to me whether these types of queries should
> > be
> > >> supported as part of the store interface or if they should only be
> > >> supported via the `query(...)` method for IQv2. (It's an open question
> > as
> > >> to whether we should support IQv1 for versioned stores or only IQv2. A
> > >> benefit of IQv2 over IQv1 is that we won't need to add individual
> store
> > >> methods for each type of query, including for all wrapped store
> layers.)
> > >>
> > >> If we have clear non-IQ use cases for these methods (e.g., use cases
> > within
> > >> processors), then they'll need to be added as part of the store
> > interface
> > >> for sure. I'm leaning towards adding them as part of the store
> interface
> > >> but given the ambiguity here, it may be preferrable to defer to a
> > follow-up
> > >> KIP. OTOH, if you think the versioned store interface as proposed in
> > this
> > >> KIP is too bare bones to be useful, I'm open to adding it in now as
> > well.
> > >>
> > >>> 2) I have a similar question as Matthias, about the timestampTo
> > argument
> > >> when doing a get. Is it inclusive or exclusive?
> > >>
> > >> Same answer (and follow-up question) as above. Do you think it will be
> > >> confusing for `get(key, tsTo)` to use an inclusive time bound, while
> > >> `get(key, tsFrom, tsTo)` would use an exclusive tsTo time bound? Maybe
> > we
> > >> should rename `get(key, tsFrom, tsTo)` to `getVersions(...)` or
> > >> `getRange(...)` in order to avoid confusion.
> > >>
> > >>> 3) validFrom sounds slightly confusing to me. It is essentially the
> > >> timestamp at which the record was inserted. validFrom makes it sound
> > like
> > >> validTo which can keep changing based on new records while *from* is
> > fixed.
> > >> WDYT?
> > >>
> > >> "It is essentially the timestamp at which the record was inserted" <--
> > Yes,
> > >> that's correct.
> > >>
> > >> I borrowed the "validFrom/validTo" terminology from temporal tables,
> > e.g.,
> > >>
> > >>
> >
> https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver16
> > >> .
> > >> I don't believe the terms "validFrom" or "validTo" are currently
> exposed
> > >> anywhere in any of the user-facing interfaces (or Javadocs); I just
> > needed
> > >> a way to refer to the concepts in the KIP. Hopefully this is a
> non-issue
> > >> (at least for now) as a result. Do you have a suggestion for
> terminology
> > >> that would've been less confusing?
> > >>
> > >>> 4) Even I think delete api should be supported.
> > >>
> > >> Makes sense. It'd be to get your input on the same follow-up
> questions I
> > >> asked Matthias above as well :)
> > >>
> > >> On Tue, Nov 22, 2022 at 4:25 AM Sagar <sagarmeansoc...@gmail.com>
> > wrote:
> > >>
> > >>> Hi Victoria,
> > >>>
> > >>> Thanks for the KIP. Seems like a very interesting idea!
> > >>>
> > >>> I have a couple of questions:
> > >>>
> > >>> 1) Did you consider adding a method similar to :
> > >>> List<ValueAndTimeStamp<V>> get(K key, long from, long to)?
> > >>>
> > >>> I think this could be useful considering that this
> > >>> versioning scheme unlocks time travel at a key basis. WDYT?
> > >>>
> > >>> 2) I have a similar question as Matthias, about the timestampTo
> > argument
> > >>> when doing a get. Is it inclusive or exclusive?
> > >>>
> > >>> 3) validFrom sounds slightly confusing to me. It is essentially the
> > >>> timestamp at which the record was inserted. validFrom makes it sound
> > like
> > >>> validTo which can keep changing based on new records while *from* is
> > >> fixed.
> > >>> WDYT?
> > >>>
> > >>> 4) Even I think delete api should be supported.
> > >>>
> > >>> Thanks!
> > >>> Sagar.
> > >>>
> > >>> On Tue, Nov 22, 2022 at 8:02 AM Matthias J. Sax <mj...@apache.org>
> > >> wrote:
> > >>>
> > >>>> Thanks for the KIP Victoria. Very well written!
> > >>>>
> > >>>>
> > >>>> Couple of questions (many might just require to add some more
> details
> > >> to
> > >>>> the KIP):
> > >>>>
> > >>>>    (1) Why does the new store not extend KeyValueStore, but
> > StateStore?
> > >>>> In the end, it's a KeyValueStore?
> > >>>>
> > >>>>    (2) Should we have a ReadOnlyVersionedKeyValueStore? Even if we
> > don't
> > >>>> want to support IQ in this KIP, it might be good to add this
> interface
> > >>>> right away to avoid complications for follow up KIPs? Or won't there
> > by
> > >>>> any complications anyway?
> > >>>>
> > >>>>    (3) Why do we not have a `delete(key)` method? I am ok with not
> > >>>> supporting all methods from existing KV-store, but a `delete(key)`
> > >> seems
> > >>>> to be fundamentally to have?
> > >>>>
> > >>>>    (4a) Do we need `get(key)`? It seems to be the same as `get(key,
> > >>>> MAX_VALUE)`? Maybe is good to have as syntactic sugar though? Just
> for
> > >>>> my own clarification (should we add something to the JavaDocs?).
> > >>>>
> > >>>>    (4b) Should we throw an exception if a user queries out-of-bound
> > >>>> instead of returning `null` (in `get(key,ts)`)?
> > >>>>     -> You put it into "rejected alternatives", and I understand
> your
> > >>>> argument. Would love to get input from others about this question
> > >>>> though. -- It seems we also return `null` for windowed stores, so
> > maybe
> > >>>> the strongest argument is to align to existing behavior? Or do we
> have
> > >>>> case for which the current behavior is problematic?
> > >>>>
> > >>>>    (4c) JavaDoc on `get(key,ts)` says: "(up to store implementation
> > >>>> discretion when this is the case)" -> Should we make it a stricter
> > >>>> contract such that the user can reason about it better (there is WIP
> > to
> > >>>> make retention time a strict bound for windowed stores atm)
> > >>>>     -> JavaDocs on `persistentVersionedKeyValueStore` seems to
> > suggest a
> > >>>> strict bound, too.
> > >>>>
> > >>>>    (5a) Do we need to expose `segmentInterval`? For windowed-stores,
> > we
> > >>>> also use segments but hard-code it to two (it was exposed in earlier
> > >>>> versions but it seems not useful, even if we would be open to expose
> > it
> > >>>> again if there is user demand).
> > >>>>
> > >>>>    (5b) JavaDocs says: "Performance degrades as more record versions
> > for
> > >>>> the same key are collected in a single segment. On the other hand,
> > >>>> out-of-order writes and reads which access older segments may slow
> > down
> > >>>> if there are too many segments." -- Wondering if JavaDocs should
> make
> > >>>> any statements about expected performance? Seems to be an
> > >> implementation
> > >>>> detail?
> > >>>>
> > >>>>    (6) validTo timestamp is "exclusive", right? Ie, if I query
> > >>>> `get(key,ts[=validToV1])` I would get `null` or the "next" record v2
> > >>>> with validFromV2=ts?
> > >>>>
> > >>>>    (7) The KIP says, that segments are stores in the same RocksDB --
> > for
> > >>>> this case, how are efficient deletes handled? For windowed-store, we
> > >> can
> > >>>> just delete a full RocksDB.
> > >>>>
> > >>>>    (8) Rejected alternatives: you propose to not return the validTo
> > >>>> timestamp -- if we find it useful in the future to return it, would
> > >>>> there be a clean path to change it accordingly?
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>> On 11/16/22 9:57 PM, Victoria Xia wrote:
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>> I have a proposal for introducing versioned state stores in Kafka
> > >>>> Streams.
> > >>>>> Versioned state stores are similar to key-value stores except they
> > >> can
> > >>>>> store multiple record versions for a single key. This KIP focuses
> on
> > >>>>> interfaces only in order to limit the scope of the KIP.
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Victoria
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-889 Versioned State Stores

Reply via email to