Re: ISR briefly shrinks then expands

Steven Taschuk Fri, 03 May 2019 09:18:21 -0700

gc pauses (in milliseconds) for all brokers in the cluster for two minutes
around one recent episode:
min  max  count
 7     8    51
 6     8    49
 6     9    53  <-- broker removed from isr
 6     9    55
 7     9    54
 7    10    52
 6    11    56
 7    11    54
 6    12    58
 7    14    52
10    18    53  <-- leader of affected partitions
12    24    56


So, all the pauses are brief (low tens of ms), and I wouldn't say the
affected brokers are outliers wrt the unaffected brokers.



On Fri, May 3, 2019 at 12:20 AM Yz Xyz <oracl...@gmail.com> wrote:

>
> Hi Steven -  what are  pause times, gc and memory usage on Kafka brokers?
> When pause times are high kafka broker disconnects from zk. goes out of
> cluster and rejoins later makes shrink and expand on isr.
>
>
> > On May 2, 2019, at 5:39 PM, Steven Taschuk <stasc...@stripe.com.invalid>
> wrote:
> >
> > Our kafka broker logs show episodes where several partitions have
> > "Shrinking ISR ..." messages followed, usually less than 2 seconds
> > later, by corresponding "Expanding ISR ..." messages that restore the
> > original set of brokers for all the partitions.  Does anyone have any
> > suggestions on how to investigate this?
> >
> > On our main cluster, consisting of 12 brokers running kafka 2.0.1,
> > with 2500-3000 partitions in 300-400 topics (each partition having 3
> > replicas), we see about 10 such episodes a day, each involving
> > typically 5-8 partitions.  The set of partitions varies but often
> > repeats or nearly repeats; to illustrate that, here's the partitions
> > affected in one broker's episodes over the past week:
> >    2019-04-27T17:51
> >
> status-4,dining-11,tax-0,education-0,government-7,locker-19,credit-11,law-15
> >    2019-04-27T18:01
> family-14,health-13,golf-2,phone-14,news-15,peace-13
> >    2019-04-27T22:59
> > stock-13,income-2,insurance-0,college-0,district-1,breast-8,back-41
> >    2019-04-28T02:35    mg-12,executive-18,nursing-12
> >    2019-04-28T02:51    mg-12,executive-18,nursing-12
> >    2019-04-28T08:34
> > health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
> >    2019-04-28T12:15
> > health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
> >    2019-04-28T19:03
> > health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
> >    2019-04-28T20:16
> > health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
> >    2019-04-29T18:16
> >
> climate-11,lemon-12,faculty-22,side-4,music-12,police-0,room-0,press-8,parking-3,subject-10,blood-3
> >    2019-04-30T22:54    living-10,death-15,drug-3,talk-12,cell-12
> >    2019-05-01T16:52
> > oil-4,community-5,ice-7,public-2,substance-0,grocery-9,carbon-27,g-12
> >    2019-05-01T17:01
> > child-16,community-5,ice-7,public-2,grocery-9,carbon-27,task-3,g-12
> >    2019-05-01T17:36
> community-5,ice-7,public-2,grocery-9,carbon-27,g-12
> >    2019-05-01T22:14
> > school-9,interest-2,kitchen-0,hotel-3,carbon-3,heart-14
> > (I've replaced the actual topic names with common words.)
> >
> > We see similar behaviour on another cluster, consisting of 7 brokers
> > running kafka 2.2.0.
> >
> > We haven't found anything unusual in the surrounding logs, or in
> > metrics about the network and disk activity of the brokers.
> >
> > Some similar-looking issues from jira:
> > https://issues.apache.org/jira/browse/KAFKA-4003
> >    similar in that the expand happens within a second or two of the
> shrink
> > https://issues.apache.org/jira/browse/KAFKA-4674
> > https://issues.apache.org/jira/browse/KAFKA-3916
> >    both of these involve disconnections, which we don't see
> > https://issues.apache.org/jira/browse/KAFKA-7152
> >    talks about a constant churn of shrink/expand, which we don't see
> > (also, was fixed in 2.1.0, and we see it on 2.2.0)
>

Re: ISR briefly shrinks then expands

Reply via email to