Thanks José for the edits, just one more clarification on the wording of this section
The current implementation parks the FETCH request if all of these > conditions are true: fetch request has a wait time, fetch request has a > wait time, fetch request requires data, fetch request doesn't have enough > data to respond, no errors happened while reading data, no diverging epoch > and no preferred read replica. > In addition to these conditions, the replica manager may also park fetch > requests if the remote replica's HWM is greater than or equal to the local > replica's HWM. Is the intent to say that in addition to all the current conditions, the HWM check is an additional condition that *must *be met in order for the fetch request to be parked? Just confirming since I'm not sure if I should interpret "may also park" too literally. On Wed, Apr 30, 2025 at 9:36 AM Colin McCabe <cmcc...@apache.org> wrote: > Hi José, > > Thanks for the explanation. That makes sense to me. > > That was the biggest quesiton I had about the KIP, I am +1 about the rest > of it. > > Do you have a PR for this yet? It will be interesting to see the latency > improvements. > > regards, > Colin > > > On Wed, Apr 30, 2025, at 08:31, José Armando García Sancio wrote: > > Hi Colin, > > > > On Mon, Apr 28, 2025 at 8:43 PM Colin McCabe <cmcc...@apache.org> wrote: > >> Maybe I missed something, but even after reading the discussion below, > I still don't understand the rationale for separating the "RPC version too > old" and "high watermark not known" cases. Is the idea that separating > these cases will make debugging easier? Like if we see a -1 on the wire, we > know that it is an unknown HWM situation, and not an older RPC version? Or > is there some other reason for separating the two? > > > > For "RPC version too old," we need a default value that would allow us > > to implement the current behavior in KRaft. Only park the request if > > there are no new records. We can use any value for this that is not a > > valid HWM value. I am simply proposing maximum int64 because it makes > > the predicate easier to understand. Consider parking the request if > > "local HWM <= remote HWM" will always be true when the remote HWM is > > maximum int64. > > > > For "Unknown HWM," it means that the remote replica supports this > > feature but at the current state it doesn't know the HWM so the leader > > should consider not parking the request if the local replica knows the > > HWM. Using -1 for unknown gives us the correct behavior if the > > predicate for parking the request is "local HWM <= remote HWM." For > > example: > > 1. If neither replica knows the HWM (-1) then park the request since > > there is nothing new to replicate: "-1 <= -1" is true. > > 2. If the remote replica doesn't know the HWM (-1) but the local > > replica knows the HWM (100) then don't park the request: "100 <= -1" > > is false. > > 3. If the remote replica knows the HWM (100) and it hasn't changed > > then park the request: "100 <= 100" is true. > > 3. If the remote replica knows the HWM (100) but it has changed (110) > > since the last fetch request: "110 <= 100" is false. > > > > Hope that helps clarify the different cases, > > -- > > -José >