Hi Colin, On Mon, Apr 28, 2025 at 8:43 PM Colin McCabe <cmcc...@apache.org> wrote: > Maybe I missed something, but even after reading the discussion below, I > still don't understand the rationale for separating the "RPC version too old" > and "high watermark not known" cases. Is the idea that separating these cases > will make debugging easier? Like if we see a -1 on the wire, we know that it > is an unknown HWM situation, and not an older RPC version? Or is there some > other reason for separating the two?
For "RPC version too old," we need a default value that would allow us to implement the current behavior in KRaft. Only park the request if there are no new records. We can use any value for this that is not a valid HWM value. I am simply proposing maximum int64 because it makes the predicate easier to understand. Consider parking the request if "local HWM <= remote HWM" will always be true when the remote HWM is maximum int64. For "Unknown HWM," it means that the remote replica supports this feature but at the current state it doesn't know the HWM so the leader should consider not parking the request if the local replica knows the HWM. Using -1 for unknown gives us the correct behavior if the predicate for parking the request is "local HWM <= remote HWM." For example: 1. If neither replica knows the HWM (-1) then park the request since there is nothing new to replicate: "-1 <= -1" is true. 2. If the remote replica doesn't know the HWM (-1) but the local replica knows the HWM (100) then don't park the request: "100 <= -1" is false. 3. If the remote replica knows the HWM (100) and it hasn't changed then park the request: "100 <= 100" is true. 3. If the remote replica knows the HWM (100) but it has changed (110) since the last fetch request: "110 <= 100" is false. Hope that helps clarify the different cases, -- -José