It looks that in your case it is because broker 1 somehow missed a
controller LeaderAndIsrRequest for [ad_click_sts,4]. So the zkVersion
would be different from the value stored in zookeeper from that on.
Therefore broker 1 failed to update ISR. In this case you have to bounce
broker to fix it.
From what you posted, it looks both broker 0 and broker 1 are having this
issue. So the question is how could both broker missed a controller
LeaderAndIsrRequest. Is there anything interesting in controller.log?

Jiangjie (Becket) Qin

On 3/10/15, 8:33 PM, "sy.pan" <shengyi....@gmail.com> wrote:

>@tao xiao and  Jiangjie Qin, Thank you very much
>
>I try to run kafka-reassign-partitions.sh, but the issue still exists…
>
>this the log info:
>
>[2015-03-11 11:00:40,086] ERROR Conditional update of path
>/brokers/topics/ad_click_sts/partitions/4/state with data
>{"controller_epoch":23,"leader":1,"version":1,"leader_epoch":35,"isr":[1,0
>]} and expected version 564 failed due to
>org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode
>= BadVersion for /brokers/topics/ad_click_sts/partitions/4/state
>(kafka.utils.ZkUtils$)
>
>[2015-03-11 11:00:40,086] INFO Partition [ad_click_sts,4] on broker 1:
>Cached zkVersion [564] not equal to that in zookeeper, skip updating ISR
>(kafka.cluster.Partition)
>>>>>>>>>>>>>>>>>>>
>
>finally, I had to restart the kafka node and the Isr problem is fixed, is
>there any better ways?
>
>Regards
>sy.pan
>
>
>> 在 2015年3月11日,03:34,Jiangjie Qin <j...@linkedin.com.INVALID> 写道:
>> 
>> This looks like a leader broker somehow did not respond to a fetch
>>request
>> from the follower. It may be because the broker was too busy. If that is
>> the case, Xiao¹s approach could help - reassign partitions or reelect
>> leaders to balance the traffic among brokers.
>> 
>> Jiangjie (Becket) Qin
>> 
>> On 3/9/15, 8:31 PM, "sy.pan" <shengyi....@gmail.com
>><mailto:shengyi....@gmail.com>> wrote:
>> 
>>> Hi, tao xiao and Jiangjie Qin
>>> 
>>> I encounter with the same issue, my node had recovered from high load
>>> problem (caused by other application)
>>> 
>>> this is the kafka-topic show:
>>> 
>>> Topic:ad_click_sts  PartitionCount:6        ReplicationFactor:2     Configs:
>>>     Topic: ad_click_sts     Partition: 0    Leader: 1       Replicas: 1,0   
>>> Isr: 1
>>>     Topic: ad_click_sts     Partition: 1    Leader: 0       Replicas: 0,1   
>>> Isr: 0
>>>     Topic: ad_click_sts     Partition: 2    Leader: 1       Replicas: 1,0   
>>> Isr: 1
>>>     Topic: ad_click_sts     Partition: 3    Leader: 0       Replicas: 0,1   
>>> Isr: 0
>>>     Topic: ad_click_sts     Partition: 4    Leader: 1       Replicas: 1,0   
>>> Isr: 1
>>>     Topic: ad_click_sts     Partition: 5    Leader: 0       Replicas: 0,1   
>>> Isr: 0
>>> 
>>> ReplicaFetcherThread info extracted from kafka server.log :
>>> 
>>> [2015-03-09 21:06:05,450] ERROR [ReplicaFetcherThread-0-0], Error in
>>> fetch Name: FetchRequest; Version: 0; CorrelationId: 7331; ClientId:
>>> ReplicaFetcherThread-0-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1
>>> bytes; RequestInfo: [ad_click_sts,5] ->
>>> PartitionFetchInfo(6149699,1048576),[ad_click_sts,3] ->
>>> PartitionFetchInfo(6147835,1048576),[ad_click_sts,1] ->
>>> PartitionFetchInfo(6235071,1048576) (kafka.server.ReplicaFetcherThread)
>>> java.net.SocketTimeoutException
>>>       at 
>>> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
>>>       at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86)
>>>       ŠŠ..
>>>       at 
>>> 
>>>kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsum
>>>er
>>> .scala:108)
>>>       at 
>>> 
>>>kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scal
>>>a:
>>> 108)
>>>       at 
>>> 
>>>kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scal
>>>a:
>>> 108)
>>>       at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>>>       at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
>>>       at 
>>> 
>>>kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherTh
>>>re
>>> ad.scala:96)
>>>       at 
>>> 
>>>kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88
>>>)
>>>       at 
>>>kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>>> 
>>> [2015-03-09 21:06:05,450] WARN Reconnect due to socket error: null
>>> (kafka.consumer.SimpleConsumer)
>>> 
>>> [2015-03-09 21:05:57,116] INFO Partition [ad_click_sts,4] on broker 1:
>>> Cached zkVersion [556] not equal to that in zookeeper, skip updating
>>>ISR
>>> (kafka.cluster.Partition)
>>> 
>>> [2015-03-09 21:06:05,772] INFO Partition [ad_click_sts,2] on broker 1:
>>> Shrinking ISR for partition [ad_click_sts,2] from 1,0 to 1
>>> (kafka.cluster.Partition)
>>> 
>>> 
>>> How to fix this Isr problem ? Is there some command can be run ?
>>> 
>>> Regards
>>> sy.pan
>

Reply via email to