Guozhang, thanks for these links.

Hi Alexis, as Guozhang said, yours seems different from our case. We
deleted a topic but caused shrinking/expanding for other topics.

Yifan

On Tue, Apr 5, 2016 at 10:02 PM, Alexis Midon <alexis.mi...@airbnb.com>
wrote:

> I ran into the same issue today. In a production cluster, I noticed the
> "Shrinking ISR for partition" log messages for a topic deleted 2 months
> ago.
> Our staging cluster shows the same messages for all the topics deleted in
> that cluster.
> Both 0.8.2
>
> Yifan, Guozhang, did you find a way to get rid of them?
>
> thanks in advance,
> alexis
>
>
> On Tue, Apr 5, 2016 at 4:16 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
>> It is possible, there are some discussions about a similar issue in KIP:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-53+-+Add+custom+policies+for+reconnect+attempts+to+NetworkdClient
>>
>> mailing thread:
>>
>> https://www.mail-archive.com/dev@kafka.apache.org/msg46868.html
>>
>>
>>
>> Guozhang
>>
>> On Tue, Apr 5, 2016 at 2:34 PM, Yifan Ying <nafan...@gmail.com> wrote:
>>
>> > Some updates:
>> >
>> > Yesterday, right after release (producers and consumers reconnected to
>> > Kafka/Zookeeper, but no code change in our producers and consumers), all
>> > under replication issues were resolved automatically and no more high
>> > latency in both Kafka and Zookeeper. But right after today's
>> > release(producers and consumers re-connected again), the under
>> replication
>> > and high latency issue happened again. So the all-at-once reconnecting
>> from
>> > producers and consumers would cause the problem? And all these only
>> > happened since I deleted a deprecated topic in production.
>> >
>> > Yifan
>> >
>> > On Tue, Apr 5, 2016 at 9:04 AM, Guozhang Wang <wangg...@gmail.com>
>> wrote:
>> >
>> >> These configs are mainly dependent on your publish throughput, since
>> the
>> >> replication throughput is higher bounded by the publish throughput. If
>> the
>> >> publish throughput is not high, then setting a lower threshold values
>> in
>> >> these two configs will cause churns in shrinking / expanding ISRs.
>> >>
>> >> Guozhang
>> >>
>> >> On Mon, Apr 4, 2016 at 11:55 PM, Yifan Ying <nafan...@gmail.com>
>> wrote:
>> >>
>> >>> Thanks for replying, Guozhang. We did increase both settings:
>> >>>
>> >>> replica.lag.max.messages=20000
>> >>>
>> >>> replica.lag.time.max.ms=20000
>> >>>
>> >>>
>> >>> But no sure if these are good enough. And yes, that's a good
>> suggestion
>> >>> to monitor ZK performance.
>> >>>
>> >>>
>> >>> Thanks.
>> >>>
>> >>> On Mon, Apr 4, 2016 at 8:58 PM, Guozhang Wang <wangg...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Hmm, it seems like your broker config "replica.lag.max.messages" and
>> "
>> >>>> replica.lag.time.max.ms" is mis-configed regarding your replication
>> >>>> traffic, and the deletion of the topic actually makes it below the
>> >>>> threshold. What are the config values for these two? And could you
>> try to
>> >>>> increase these configs and see if that helps?
>> >>>>
>> >>>> In 0.8.2.1 Kafka-consumer-offset-checker.sh access ZK to query the
>> >>>> consumer offsets one-by-one, and hence if your ZK read latency is
>> high it
>> >>>> could take long time. You may want to monitor your ZK cluster
>> performance
>> >>>> to check its read / write latencies.
>> >>>>
>> >>>>
>> >>>> Guozhang
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Apr 4, 2016 at 10:59 AM, Yifan Ying <nafan...@gmail.com>
>> wrote:
>> >>>>
>> >>>>> Hi Guozhang,
>> >>>>>
>> >>>>> It's 0.8.2.1. So it should be fixed? We also tried to start from
>> >>>>> scratch by wiping out the data directory on both Kafka and
>> Zookeeper. And
>> >>>>> it's odd that the constant shrinking and expanding happened after
>> fresh
>> >>>>> restart, and high request latency as well. The brokers are using
>> the same
>> >>>>> config before topic deletion.
>> >>>>>
>> >>>>> Another observation is that, using the
>> >>>>> Kafka-consumer-offset-checker.sh is extremely slow. Any suggestion
>> would be
>> >>>>> appreciated! Thanks.
>> >>>>>
>> >>>>> On Sun, Apr 3, 2016 at 2:29 PM, Guozhang Wang <wangg...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Yifan,
>> >>>>>>
>> >>>>>> Are you on 0.8.0 or 0.8.1/2? There are some issues with zkVersion
>> >>>>>> checking
>> >>>>>> in 0.8.0 that are fixed in later minor releases of 0.8.
>> >>>>>>
>> >>>>>> Guozhang
>> >>>>>>
>> >>>>>> On Fri, Apr 1, 2016 at 7:46 PM, Yifan Ying <nafan...@gmail.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> > Hi All,
>> >>>>>> >
>> >>>>>> > We deleted a deprecated topic on Kafka cluster(0.8) and started
>> >>>>>> observing
>> >>>>>> > constant 'Expanding ISR for partition' and 'Shrinking ISR for
>> >>>>>> partition'
>> >>>>>> > for other topics. As a result we saw a huge number of under
>> >>>>>> replicated
>> >>>>>> > partitions and very high request latency from Kafka. And it
>> doesn't
>> >>>>>> seem
>> >>>>>> > able to recover itself.
>> >>>>>> >
>> >>>>>> > Anyone knows what caused this issue and how to resolve it?
>> >>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> -- Guozhang
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Yifan
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> -- Guozhang
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Yifan
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> -- Guozhang
>> >>
>> >
>> >
>> >
>> > --
>> > Yifan
>> >
>> >
>> >
>>
>>
>> --
>> -- Guozhang
>>
>


-- 
Yifan

Reply via email to