Thanks for the KIP, Satish. I am trying to understand the problem we are looking to solve with this KIP. When the leader is slow in processing fetch requests from the follower (due to disk, GC, or other reasons), the primary problem is that it could impact read and write latency and at times cause unavailability depending on how long the leader continues to be in this state.
How does solution 1 solve the problem? It seems like it prevents followers from being removed from the ISR but that by itself would not address the availability problem, is that right? - Dhruvil On Wed, Jun 23, 2021 at 6:12 AM Ryanne Dolan <ryannedo...@gmail.com> wrote: > Satish, we encounter this frequently and consider it a major bug. Your > solution makes sense to me. > > Ryanne > > On Tue, Jun 22, 2021, 7:29 PM Satish Duggana <satish.dugg...@gmail.com> > wrote: > > > Hi, > > Bumping up the discussion thread on KIP-501 about avoiding out-of-sync or > > offline partitions when follower fetch requests are not processed in time > > by the leader replica. This issue occurred several times in multiple > > production environments (at Uber, Yelp, Twitter, etc). > > > > KIP-501 is located here > > < > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+out-of-sync+or+offline+partitions+when+follower+fetch+requests+are+not+processed+in+time > > >. > > You may want to look at the earlier mail discussion thread here > > < > > > https://mail-archives.apache.org/mod_mbox/kafka-dev/202002.mbox/%3Cpony-9f4e96e457398374499ab892281453dcaa7dc679-11722f366b06d9f46bcb5905ff94fd6ab167598e%40dev.kafka.apache.org%3E > > >, > > and here > > < > > > https://mail-archives.apache.org/mod_mbox/kafka-dev/202002.mbox/%3CCAM-aUZnJ4z%2B_ztjF6sXSL61M1me0ogWZ1BV6%2BoV45rJMG8EoZA%40mail.gmail.com%3E > > > > > . > > > > Please take a look, I would like to hear your feedback and suggestions. > > > > Thanks, > > Satish. > > >