Alexis,

Hmm, yours seems a bug in Kafka brokers since your message relates to a
topic that has been deleted months ago, indicating that the topic was not
deleted cleanly. Could you file a JIRA with server logs for further
investigation?

Guozhang


On Tue, Apr 5, 2016 at 10:02 PM, Alexis Midon <
alexis.mi...@airbnb.com.invalid> wrote:

> I ran into the same issue today. In a production cluster, I noticed the
> "Shrinking ISR for partition" log messages for a topic deleted 2 months
> ago.
> Our staging cluster shows the same messages for all the topics deleted in
> that cluster.
> Both 0.8.2
>
> Yifan, Guozhang, did you find a way to get rid of them?
>
> thanks in advance,
> alexis
>
>
> On Tue, Apr 5, 2016 at 4:16 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > It is possible, there are some discussions about a similar issue in KIP:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-53+-+Add+custom+policies+for+reconnect+attempts+to+NetworkdClient
> >
> > mailing thread:
> >
> > https://www.mail-archive.com/dev@kafka.apache.org/msg46868.html
> >
> >
> >
> > Guozhang
> >
> > On Tue, Apr 5, 2016 at 2:34 PM, Yifan Ying <nafan...@gmail.com> wrote:
> >
> > > Some updates:
> > >
> > > Yesterday, right after release (producers and consumers reconnected to
> > > Kafka/Zookeeper, but no code change in our producers and consumers),
> all
> > > under replication issues were resolved automatically and no more high
> > > latency in both Kafka and Zookeeper. But right after today's
> > > release(producers and consumers re-connected again), the under
> > replication
> > > and high latency issue happened again. So the all-at-once reconnecting
> > from
> > > producers and consumers would cause the problem? And all these only
> > > happened since I deleted a deprecated topic in production.
> > >
> > > Yifan
> > >
> > > On Tue, Apr 5, 2016 at 9:04 AM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > >> These configs are mainly dependent on your publish throughput, since
> the
> > >> replication throughput is higher bounded by the publish throughput. If
> > the
> > >> publish throughput is not high, then setting a lower threshold values
> in
> > >> these two configs will cause churns in shrinking / expanding ISRs.
> > >>
> > >> Guozhang
> > >>
> > >> On Mon, Apr 4, 2016 at 11:55 PM, Yifan Ying <nafan...@gmail.com>
> wrote:
> > >>
> > >>> Thanks for replying, Guozhang. We did increase both settings:
> > >>>
> > >>> replica.lag.max.messages=20000
> > >>>
> > >>> replica.lag.time.max.ms=20000
> > >>>
> > >>>
> > >>> But no sure if these are good enough. And yes, that's a good
> suggestion
> > >>> to monitor ZK performance.
> > >>>
> > >>>
> > >>> Thanks.
> > >>>
> > >>> On Mon, Apr 4, 2016 at 8:58 PM, Guozhang Wang <wangg...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hmm, it seems like your broker config "replica.lag.max.messages"
> and "
> > >>>> replica.lag.time.max.ms" is mis-configed regarding your replication
> > >>>> traffic, and the deletion of the topic actually makes it below the
> > >>>> threshold. What are the config values for these two? And could you
> > try to
> > >>>> increase these configs and see if that helps?
> > >>>>
> > >>>> In 0.8.2.1 Kafka-consumer-offset-checker.sh access ZK to query the
> > >>>> consumer offsets one-by-one, and hence if your ZK read latency is
> > high it
> > >>>> could take long time. You may want to monitor your ZK cluster
> > performance
> > >>>> to check its read / write latencies.
> > >>>>
> > >>>>
> > >>>> Guozhang
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Mon, Apr 4, 2016 at 10:59 AM, Yifan Ying <nafan...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi Guozhang,
> > >>>>>
> > >>>>> It's 0.8.2.1. So it should be fixed? We also tried to start from
> > >>>>> scratch by wiping out the data directory on both Kafka and
> > Zookeeper. And
> > >>>>> it's odd that the constant shrinking and expanding happened after
> > fresh
> > >>>>> restart, and high request latency as well. The brokers are using
> the
> > same
> > >>>>> config before topic deletion.
> > >>>>>
> > >>>>> Another observation is that, using the
> > >>>>> Kafka-consumer-offset-checker.sh is extremely slow. Any suggestion
> > would be
> > >>>>> appreciated! Thanks.
> > >>>>>
> > >>>>> On Sun, Apr 3, 2016 at 2:29 PM, Guozhang Wang <wangg...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Yifan,
> > >>>>>>
> > >>>>>> Are you on 0.8.0 or 0.8.1/2? There are some issues with zkVersion
> > >>>>>> checking
> > >>>>>> in 0.8.0 that are fixed in later minor releases of 0.8.
> > >>>>>>
> > >>>>>> Guozhang
> > >>>>>>
> > >>>>>> On Fri, Apr 1, 2016 at 7:46 PM, Yifan Ying <nafan...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> > Hi All,
> > >>>>>> >
> > >>>>>> > We deleted a deprecated topic on Kafka cluster(0.8) and started
> > >>>>>> observing
> > >>>>>> > constant 'Expanding ISR for partition' and 'Shrinking ISR for
> > >>>>>> partition'
> > >>>>>> > for other topics. As a result we saw a huge number of under
> > >>>>>> replicated
> > >>>>>> > partitions and very high request latency from Kafka. And it
> > doesn't
> > >>>>>> seem
> > >>>>>> > able to recover itself.
> > >>>>>> >
> > >>>>>> > Anyone knows what caused this issue and how to resolve it?
> > >>>>>> >
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> -- Guozhang
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Yifan
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> -- Guozhang
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Yifan
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> > >
> > >
> > > --
> > > Yifan
> > >
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Reply via email to