Hi Jun, Thanks for reviewing the KIP.
1) When we need metadata, leastLoadedNode() attempts connection to node1 and if that fails, node1 is in reconnection backoff for a second. So leastLoadedNode() attempts connection to node2. If the connection attempt to node2 takes 5 seconds and fails, we try leastLoadedNode again. This time, node1 is no longer in reconnection backoff because its backoff period of one second has elapsed. With the metadata recovery strategy in KIP-899, all nodes have to be unavailable to trigger rebootstrap. Since node1 is no longer in backoff, we will attempt node1 again rather than rebootstrap. 2) Good point. Since we are changing the default rebootstrap strategy, it makes sense to include share consumer and KStreams as well in this KIP. Looks like we have WorkerConfig in Connect too. Will add them to the KIP. Regards, Rajini On Thu, Nov 7, 2024 at 1:32 AM Jun Rao <j...@confluent.io.invalid> wrote: > Hi, Rajini, > > Thanks for the KIP. Looks good to me overall. Just a couple of minor > comments. > > 1. "So if a connection attempt is stuck for over a second, other nodes that > are in reconnection backoff state will no longer be in backoff and as a > result, rebootstrap will never be attempted." Not sure that I understand > this. Why would a stuck connection to one node affect the reconnection > backoff state to other nodes? > > 2. metadata.recovery.strategy and metadata.recovery.rebootstrap.trigger.ms > : > Should we add them to share consumer and kstreams too or leave that for > future KIPs? > > Jun > > On Mon, Nov 4, 2024 at 7:38 AM Rajini Sivaram <rajinisiva...@gmail.com> > wrote: > > > Hi Apoorv, > > > > Thanks for the review. Good idea to prefix with metadata.recovery, I > have > > renamed the config to metadata.recovery.rebootstrap.trigger.ms as you > > suggested. > > > > Regards, > > > > Rajini > > > > On Mon, Nov 4, 2024 at 2:43 PM Apoorv Mittal <apoorvmitta...@gmail.com> > > wrote: > > > > > Hi Rajni, > > > Thanks for the KIP. It would be a good addition. > > > > > > If I understand it correctly then the client will trigger the > > re-bootstrap > > > process when the client is unable to obtain metadata in a defined > period > > of > > > time. The proposed configuration "rebootstrap.timeout.ms" seems to me > > more > > > of a timeout for the re-bootstrap process, rather than the time to > > trigger > > > the re-bootstrap process. > > > > > > Do you think "metadata.recovery.rebootstrap.trigger.ms" would be a > > better > > > name? > > > > > > - > > > > > > *metadata.recovery.rebootstrap.trigger.ms > > > <http://metadata.recovery.rebootstrap.trigger.ms>:* This is a clear > > and > > > concise option. It directly conveys the purpose of the > configuration, > > > which > > > is to define the timeout for triggering a re-bootstrap when metadata > > > cannot > > > be obtained. It specifies not only the trigger but also the recovery > > > strategy (re-bootstrap) associated with it. > > > > > > Additionally, the other option I was thinking of was " > > > metadata.rebootstrap.trigger.ms". > > > > > > Regards, > > > Apoorv Mittal > > > > > > > > > On Mon, Nov 4, 2024 at 10:17 AM Rajini Sivaram < > rajinisiva...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > Any other feedback or suggestions? If there are no concerns, I will > > start > > > > the vote tomorrow. > > > > > > > > Thank you, > > > > > > > > Rajini > > > > > > > > > > > > On Wed, Oct 30, 2024 at 4:27 PM Rajini Sivaram < > > rajinisiva...@gmail.com> > > > > wrote: > > > > > > > > > Hi Andrew, > > > > > > > > > > Thanks for reviewing the KIP. > > > > > > > > > > AS1: Updated to KIP-559. Thanks for pointing that out. > > > > > AS2: Yes, that is correct. The error code is a performance > > optimization > > > > to > > > > > avoid unavailability for 5 minutes when a proxy may be able to tell > > the > > > > > client to rebootstrap immediately. > > > > > > > > > > Regards, > > > > > > > > > > Rajini > > > > > > > > > > > > > > > > > > > > On Wed, Oct 30, 2024 at 2:51 PM Andrew Schofield < > > > > > andrew_schofield_j...@outlook.com> wrote: > > > > > > > > > >> Hi Rajini, > > > > >> Thanks for the KIP. It looks like a useful improvement for > > > > >> rebootstrapping. > > > > >> I really like using the new major release as a way to change the > > > default > > > > >> so > > > > >> rebootstrapping can be taken for granted in the future. > > > > >> > > > > >> A couple of comments. > > > > >> AS1: (nit) KIP-559, not KIP-599, was the proxy-friendliness KIP. > > > > >> AS2: I suppose that the new error code is really just a > performance > > > > >> optimisation for situations where the proxy can tell that > > > > rebootstrapping > > > > >> is required and it can avoid the clients waiting the full 5 > minutes. > > > Is > > > > >> this > > > > >> correct or is there more to it? > > > > >> > > > > >> Thanks, > > > > >> Andrew > > > > >> > > > > >> ________________________________________ > > > > >> From: Rajini Sivaram <rajinisiva...@gmail.com> > > > > >> Sent: 28 October 2024 19:31 > > > > >> To: dev <dev@kafka.apache.org> > > > > >> Subject: [DISCUSS] KIP-1102: Enable clients to rebootstrap based > on > > > > >> timeout or error code > > > > >> > > > > >> Hi everyone, > > > > >> > > > > >> I would like to start discussion on KIP-1102 ( > > > > >> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1102%3A+Enable+clients+to+rebootstrap+based+on+timeout+or+error+code > > > > >> ). > > > > >> This KIP extends KIP-899 by introducing a timeout configuration > and > > > > error > > > > >> code that can be used to trigger rebootstrapping in clients. The > KIP > > > > also > > > > >> proposes to enable rebootstrapping by default in 4.0.0. > > > > >> > > > > >> Thank you, > > > > >> > > > > >> Rajini > > > > >> > > > > > > > > > > > > > > >