Re: [DISCUSS] KIP-1166: Improve high-watermark replication

Jun Rao Wed, 30 Apr 2025 10:09:31 -0700

Hi, Jose,

Thanks for updating the KIP. A few more comments.


JR5. The motivation section still sounds like this is a KRaft only problem.
It would be useful to cover the issue with fetch from followers too.

JR6. Since there is an inter broker RPC change, we need to document the
upgrade path with a new MV.

JR7. Could we document whether the consumer client should set the new field
or not?

Jun

On Wed, Apr 30, 2025 at 9:52 AM Alyssa Huang <ahu...@confluent.io.invalid>
wrote:

> Thanks José for the edits, just one more clarification on the wording of
> this section
>
> The current implementation parks the FETCH request if all of these
> > conditions are true: fetch request has a wait time, fetch request has a
> > wait time, fetch request requires data, fetch request doesn't have enough
> > data to respond, no errors happened while reading data, no diverging
> epoch
> > and no preferred read replica.
> > In addition to these conditions, the replica manager may also park fetch
> > requests if the remote replica's HWM is greater than or equal to the
> local
> > replica's HWM.
>
>
> Is the intent to say that in addition to all the current conditions,
> the HWM check is an additional condition that *must *be met in order for
> the fetch request to be parked? Just confirming since I'm not sure if I
> should interpret "may also park" too literally.
>
> On Wed, Apr 30, 2025 at 9:36 AM Colin McCabe <cmcc...@apache.org> wrote:
>
> > Hi José,
> >
> > Thanks for the explanation. That makes sense to me.
> >
> > That was the biggest quesiton I had about the KIP, I am +1 about the rest
> > of it.
> >
> > Do you have a PR for this yet? It will be interesting to see the latency
> > improvements.
> >
> > regards,
> > Colin
> >
> >
> > On Wed, Apr 30, 2025, at 08:31, José Armando García Sancio wrote:
> > > Hi Colin,
> > >
> > > On Mon, Apr 28, 2025 at 8:43 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >> Maybe I missed something, but even after reading the discussion below,
> > I still don't understand the rationale for separating the "RPC version
> too
> > old" and "high watermark not known" cases. Is the idea that separating
> > these cases will make debugging easier? Like if we see a -1 on the wire,
> we
> > know that it is an unknown HWM situation, and not an older RPC version?
> Or
> > is there some other reason for separating the two?
> > >
> > > For "RPC version too old," we need a default value that would allow us
> > > to implement the current behavior in KRaft. Only park the request if
> > > there are no new records. We can use any value for this that is not a
> > > valid HWM value. I am simply proposing maximum int64 because it makes
> > > the predicate easier to understand. Consider parking the request if
> > > "local HWM <= remote HWM" will always be true when the remote HWM is
> > > maximum int64.
> > >
> > > For "Unknown HWM," it means that the remote replica supports this
> > > feature but at the current state it doesn't know the HWM so the leader
> > > should consider not parking the request if the local replica knows the
> > > HWM. Using -1 for unknown gives us the correct behavior if the
> > > predicate for parking the request is "local HWM <= remote HWM." For
> > > example:
> > > 1. If neither replica knows the HWM (-1) then park the request since
> > > there is nothing new to replicate: "-1 <= -1" is true.
> > > 2. If the remote replica doesn't know the HWM (-1) but the local
> > > replica knows the HWM (100) then don't park the request: "100 <= -1"
> > > is false.
> > > 3. If the remote replica knows the HWM (100) and it hasn't changed
> > > then park the request: "100 <= 100" is true.
> > > 3. If the remote replica knows the HWM (100) but it has changed (110)
> > > since the last fetch request: "110 <= 100" is false.
> > >
> > > Hope that helps clarify the different cases,
> > > --
> > > -José
> >
>

Re: [DISCUSS] KIP-1166: Improve high-watermark replication

Reply via email to