Thanks Harsha, I will play with these settings. -Ashish
Sent from Yahoo Mail for iPhone On Monday, January 28, 2019, 2:13 PM, Harsha Chintalapani <ka...@harsha.io> wrote: We’ve seen this similar in our setup and as you noticed it does happen infrequently. Based on my debugging there are few things that might be causing this issue , one of them would be 1. replica.lag.time.max.ms set to 10secs by default 2. replica.socket.timeout.ms set to 30secs by default In situations where the broker is busy with lots of clients , a follower making a replica request and if this request takes longer or times out i.e waits for 30 secs and didn’t get any response. ReplicaManager thread calls maybeShrinkISR and shrinks the ISR if there no call from a follower with in replica.lag.time.max.ms which is possible in cases of heavy load and given the socket timeout itself takes 30secs it can be marked as not in ISR. What we’ve seen is shrinkISR and expandISR happening back to back i.e one call is getting timed out and subsequent call making it part of ISR. One option to try is to lower the socket timeout to be lower and increase the lag.time.max.ms . Thanks, Harsha On Jan 27, 2019, 8:48 AM -0800, Ashish Karalkar <ashish_karal...@yahoo.com.INVALID>, wrote: > Hi Harsha, > Thanks for the reply. > Issue is resolved as of now and the root cause was a runaway application > spawning many instances of kafkacat and hammering kafka brokers. I am still > wondering that what could be reason for shrink and expand is a client hammers > a broker . > --Ashish > On Thursday, January 24, 2019, 8:53:10 AM PST, Harsha Chintalapani > <ka...@harsha.io> wrote: > > Hi Ashish, > Whats your replica.lag.time.max.ms set to and do you see any > network issues between brokers. > -Harsha > > > > On Jan 22, 2019, 10:09 PM -0800, Ashish Karalkar > <ashish_karal...@yahoo.com.INVALID>, wrote: > > Hi All, > > We just upgraded from 0.10.x to 1.1 and enabled rack awareness on an > > existing clusters which has about 20 nodes in 4 rack . After this we see > > that few brokers goes on continuous expand and shrink ISR to itself cycle > > , it is also causing high time for serving meta data requests. > > What is the impact of enabling rack awareness on existing cluster assuming > > replication factor is 3 and all existing replica may or may not be in > > different rack when rack awareness was enabled after which a rolling bounce > > was done. > > Symptoms we are having are replica lag and slow metadata requests. Also in > > brokers log we continuously see disconnection from the broker where it is > > trying to expand. > > Thanks for helping > > --A