We can monitor below replica related metrics. Try tuning " replica.lag.time.max.ms" , "replica.fetch.max.bytes" . look for logs starting with "Shrinking ISR for partition ...".
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec kafka.server:type=ReplicaManager,name=IsrExpandsPerSec kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) On Thu, Oct 25, 2018 at 7:18 PM Suman B N <sumannew...@gmail.com> wrote: > Still looking for some response here. Pls assist. > > On Sat, Oct 20, 2018 at 12:43 AM Suman B N <sumannew...@gmail.com> wrote: > > > Rate of ingestion is not 150-200rps. Its 150k-200k rps. > > > > On Fri, Oct 19, 2018 at 11:12 PM Suman B N <sumannew...@gmail.com> > wrote: > > > >> Team, > >> We have been observing some partitions being under-replicated. Broker > >> version 0.10.2.1. Below actions were carried out but in vain: > >> > >> - Tried restarting nodes. > >> - Tried increasing replica fetcher threads. Recommend ideal replica > >> fetcher threads for a 20 node cluster with 150-200rps spread across > 1000 > >> topics and 3000 partitions. > >> - Tried increasing network threads. (I think this doesn't have any > >> effect but still wanted to try). Recommend ideal network threads for > a 20 > >> node cluster with 150-200rps spread across 1000 topics and 3000 > partitions. > >> > >> Logs look very clean. No exceptions. I don't have much idea on how > >> replica fetcher threads and logs can be debugged. So asking for help > here. > >> Any help or leads would be appreciated. > >> > >> -- > >> *Suman* > >> *OlaCabs* > >> > > > > > > -- > > *Suman* > > *OlaCabs* > > > > > -- > *Suman* > *OlaCabs* >