The voting thread for this KIP is open. Thanks, Guozhang and Matthias, for
already having voted!

On Fri, Mar 17, 2023 at 1:18 PM Guozhang Wang <guozhang.wang...@gmail.com>
wrote:

> Thanks Matthias / Victoria, both bullet points make sense to me.
>
> On Thu, Mar 16, 2023 at 10:39 AM Victoria Xia
> <victoria....@confluent.io.invalid> wrote:
> >
> > Thanks for your comments, Matthias!
> >
> > > For stream-table joins, I think we need to elaborate that a `get(k,
> ts)`
> > call now might return `null` if the history retention of the store is too
> > short.
> >
> > Great callout -- I agree we should definitely clarify this in the KIP and
> > mention it in the eventual docs as well.
> >
> > When a call to `get(k, ts)` returns null, there's not really a good way
> to
> > distinguish whether it's because the timestamp is outside of the store's
> > history retention or if it's because there's actually no record version
> for
> > the key at the specified timestamp. Determining this from the processor
> > would require (1) exposing the store's history retention to the
> processor,
> > and (2) reconciling the fact that state stores today (including the new
> > versioned store implementation) track their own observed stream time
> > separate from processor time.
> >
> > In light of this, I think your proposal to treat a null from `get(k, ts)`
> > due to history retention having been exceeded the same as we'd treat any
> > other null makes sense, and is also our only viable option right now.
> I'll
> > call this out in the docs so users are aware that their choice of history
> > retention has this implication.
> >
> > > For left-table-table joins, there seems to be no special impact, but it
> > should be called out, too. The lookup itself does not go into the history
> > of the table so no change here (as we don't have the "query older than
> > history case")
> >
> > Yup, we're on the same page. Using a versioned store for table-table
> joins
> > results in the semantic change that the join result will include the
> > latest-by-timestamp record rather than the latest-by-offset record, but
> no
> > timestamped lookups (i.e., `get(key, ts)` calls) are used in the process
> so
> > there is no concern about history retention having elapsed and affecting
> > join results. (The only implication of history retention for this use
> case
> > is indirect, since history retention doubles as grace period for the
> store.
> > Because grace period is per store instance, which has task-level
> > granularity, that means if grace period is set too low then the latest
> > record for one key could be dropped from the store if another key has
> > already advanced the store's observed stream time past the grace period
> by
> > the time that this record is seen.)
> >
> > I will update the KIP with these additional notes.
> >
> > Thanks,
> > Victoria
> >
> > On Wed, Mar 15, 2023 at 7:16 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> > > Thanks for the KIP! Great to see a first step towards using the new
> > > versioned stores!
> > >
> > > I think the described tradeoffs make sense and I like make a pragmatic
> > > step into the right direction, and avoid boiling the ocean. Thus, I
> > > agree to the proposed solution.
> > >
> > > One minor thing, that I believe just need clarification in the KIP
> (does
> > > not seem to be a change to the KIP itself):
> > >
> > > For stream-table joins, I think we need to elaborate that a `get(k,
> ts)`
> > > call now might return `null` if the history retention of the store is
> > > too short. For inner-joins it would result in no output record (ie,
> > > stream input record is dropped). Would be good to have it mentioned in
> > > the KIP explicitly.
> > >
> > > We should also discuss how left-joins should work for this case. I
> think
> > > it's ok (better) to include the stream record in the result if the
> > > lookup returns `null` -- either because no key exist in the exiting
> > > history for the provided timestamp, or (the actual case in question)
> > > because we query older than available history. If you agree, can we add
> > > this to the KIP?
> > >
> > > For left-table-table joins, there seems to be no special impact, but it
> > > should be called out, too. The lookup itself does not go into the
> > > history of the table so no change here (as we don't have the "query
> > > older than history case") -- and for out-of-order records, we just
> > > "drop" them anyway, so no change for left-joins either I believe.
> > >
> > >
> > > -Matthias
> > >
> > >
> > >
> > > On 3/15/23 2:00 PM, Guozhang Wang wrote:
> > > > Sounds good to me. Thanks!
> > > >
> > > > On Wed, Mar 15, 2023 at 12:07 PM Victoria Xia
> > > > <victoria....@confluent.io.invalid> wrote:
> > > >>
> > > >> Thanks for kicking off the discussion, John and Guozhang!
> > > >>
> > > >>> Just one thing that might be out of scope: if users want to enable
> the
> > > >> versioned table feature across the topology, should we allow them
> to do
> > > it
> > > >> via a single config rather than changing the materialized object at
> each
> > > >> place?
> > > >>
> > > >> Yes, I think this would be a great usability improvement and am in
> > > favor of
> > > >> introducing such a config. As long as the config defaults to using
> > > >> unversioned stores (which makes sense anyway), there will be no
> > > >> compatibility concerns with introducing the config in a future
> release.
> > > >> It's out of scope for this particular KIP as a result, but can
> > > hopefully be
> > > >> introduced as part of the next release after 3.5.
> > > >>
> > > >> Best,
> > > >> Victoria
> > > >>
> > > >> On Wed, Mar 15, 2023 at 10:49 AM Guozhang Wang <
> > > guozhang.wang...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Thanks Victoria for the great writeup, with a thorough analysis and
> > > >>> trade-offs. I do not have any major questions about the proposal.
> > > >>>
> > > >>> Just one thing that might be out of scope: if users want to enable
> the
> > > >>> versioned table feature across the topology, should we allow them
> to
> > > >>> do it via a single config rather than changing the materialized
> object
> > > >>> at each place? Maybe we can defer that for future discussions, but
> > > >>> just want to hear your thoughts.
> > > >>>
> > > >>> Anyways, I think this proposal is great just as-is even if we
> agree to
> > > >>> do the configuration improvement later.
> > > >>>
> > > >>>
> > > >>> Guozhang
> > > >>>
> > > >>> On Thu, Mar 9, 2023 at 7:52 PM John Roesler <vvcep...@apache.org>
> > > wrote:
> > > >>>>
> > > >>>> Thanks for the KIP, Victoria!
> > > >>>>
> > > >>>> I had some questions/concerns, but you addressed them in the
> Rejected
> > > >>> Alternatives section. Thanks for the thorough proposal!
> > > >>>>
> > > >>>> -John
> > > >>>>
> > > >>>> On Thu, Mar 9, 2023, at 18:59, Victoria Xia wrote:
> > > >>>>> Hi everyone,
> > > >>>>>
> > > >>>>> I have a proposal for updating Kafka Streams's stream-table join
> and
> > > >>>>> table-table join semantics for the new versioned key-value state
> > > stores
> > > >>>>> introduced in KIP-889
> > > >>>>> <
> > > >>>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores
> > > >>>> .
> > > >>>>> Would love to hear your thoughts and suggestions.
> > > >>>>>
> > > >>>>>
> > > >>>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+Join+Processor+Semantics+for+Versioned+Stores
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Victoria
> > > >>>
> > >
>

Reply via email to