Hi José, Thanks for the explanation. That makes sense to me.
That was the biggest quesiton I had about the KIP, I am +1 about the rest of it. Do you have a PR for this yet? It will be interesting to see the latency improvements. regards, Colin On Wed, Apr 30, 2025, at 08:31, José Armando García Sancio wrote: > Hi Colin, > > On Mon, Apr 28, 2025 at 8:43 PM Colin McCabe <cmcc...@apache.org> wrote: >> Maybe I missed something, but even after reading the discussion below, I >> still don't understand the rationale for separating the "RPC version too >> old" and "high watermark not known" cases. Is the idea that separating these >> cases will make debugging easier? Like if we see a -1 on the wire, we know >> that it is an unknown HWM situation, and not an older RPC version? Or is >> there some other reason for separating the two? > > For "RPC version too old," we need a default value that would allow us > to implement the current behavior in KRaft. Only park the request if > there are no new records. We can use any value for this that is not a > valid HWM value. I am simply proposing maximum int64 because it makes > the predicate easier to understand. Consider parking the request if > "local HWM <= remote HWM" will always be true when the remote HWM is > maximum int64. > > For "Unknown HWM," it means that the remote replica supports this > feature but at the current state it doesn't know the HWM so the leader > should consider not parking the request if the local replica knows the > HWM. Using -1 for unknown gives us the correct behavior if the > predicate for parking the request is "local HWM <= remote HWM." For > example: > 1. If neither replica knows the HWM (-1) then park the request since > there is nothing new to replicate: "-1 <= -1" is true. > 2. If the remote replica doesn't know the HWM (-1) but the local > replica knows the HWM (100) then don't park the request: "100 <= -1" > is false. > 3. If the remote replica knows the HWM (100) and it hasn't changed > then park the request: "100 <= 100" is true. > 3. If the remote replica knows the HWM (100) but it has changed (110) > since the last fetch request: "110 <= 100" is false. > > Hope that helps clarify the different cases, > -- > -José