Re: Broker Exceptions

Zakee Sat, 14 Mar 2015 20:36:32 -0700

log.cleanup.policy is delete not compact. 
log.cleaner.enable=true
log.cleaner.threads=5
log.cleanup.policy=delete
log.flush.scheduler.interval.ms=3000
log.retention.minutes=1440
log.segment.bytes=1073741824  (1gb)


Messages are keyed but not compressed, producer async and uses kafka default 
partitioner.
String message = msg.getString();
String uniqKey = ""+rnd.nextInt();// random key
String partKey = getPartitionKey();// partition key
KeyedMessage<String, String> data = new KeyedMessage<String, 
String>(this.topicName, uniqKey, partKey, message);
producer.send(data);

Thanks
Zakee



> On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
> 
> Is your topic log compacted? Also if it is are the messages keyed? Or are the 
> messages compressed?
> 
> Thanks,
> 
> Mayuresh
> 
> Sent from my iPhone
> 
>> On Mar 14, 2015, at 2:02 PM, Zakee <kzak...@netzero.net 
>> <mailto:kzak...@netzero.net>> wrote:
>> 
>> Thanks, Jiangjie for helping resolve the kafka controller migration driven 
>> partition leader rebalance issue. The logs are much cleaner now. 
>> 
>> There are a few incidences of Out of range offset even though  there is no 
>> consumers running, only producers and replica fetchers. I was trying to 
>> relate to a cause, looks like compaction (log segment deletion) causing 
>> this. Not sure whether this is expected behavior.
>> 
>> Broker-4:
>> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when 
>> processing fetch request for partition [Topic22kv,5] offset 1754769769 from 
>> follower with correlation id 1645671. Possible cause: Request for offset 
>> 1754769769 but we only have log segments in the range 1400864851 to 
>> 1754769732. (kafka.server.ReplicaManager)
>> 
>> Broker-3:
>> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is 
>> aborted and paused (kafka.log.LogCleaner)
>> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log 
>> Topic22kv-5 for deletion. (kafka.log.Log)
>> …
>> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is 
>> resumed (kafka.log.LogCleaner)
>> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 
>> 1754769769 for partition [Topic22kv,5] out of range; reset offset to 
>> 1400864851 (kafka.server.ReplicaFetcherThread)
>> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for 
>> partition [Topic22kv,5] reset its fetch offset from 1400864851 to current 
>> leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
>> 
>> ____________________________________________________________
>> Old School Yearbook Pics
>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc 
>> <http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
>> <topic22kv_746a_314_logs.txt>
>> 
>> 
>> Thanks
>> Zakee
>> 
>>> On Mar 9, 2015, at 12:18 PM, Zakee <kzak...@netzero.net> wrote:
>>> 
>>> No broker restarts.
>>> 
>>> Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 
>>> <https://issues.apache.org/jira/browse/KAFKA-2011>
>>> 
>>>>> Logs for rebalance:
>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
>>>>> election for partitions: (kafka.controller.KafkaController)
>>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
>>>>> preferred replica election: (kafka.controller.KafkaController)
>>>>> …
>>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
>>>>> election for partitions: (kafka.controller.KafkaController)
>>>>> ...
>>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
>>>>> election for partitions: (kafka.controller.KafkaController)
>>>>> ...
>>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
>>>>> leader election for partitions (kafka.controller.KafkaController)
>>>>> ...
>>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
>>>>> preferred replica election:  (kafka.controller.KafkaController)
>>>>> 
>>>>> Also, I still see lots of below errors (~69k) going on in the logs since 
>>>>> the restart. Is there any other reason than rebalance for these errors?
>>>>> 
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
>>>>> partition [Topic-11,7] to broker 5:class 
>>>>> kafka.common.NotLeaderForPartitionException 
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
>>>>> partition [Topic-2,25] to broker 5:class 
>>>>> kafka.common.NotLeaderForPartitionException 
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
>>>>> partition [Topic-2,21] to broker 5:class 
>>>>> kafka.common.NotLeaderForPartitionException 
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
>>>>> partition [Topic-22,9] to broker 5:class 
>>>>> kafka.common.NotLeaderForPartitionException 
>>>>> (kafka.server.ReplicaFetcherThread)
>>> 
>>> 
>>>> Could you paste the related logs in controller.log?
>>> What specifically should I search for in the logs?
>>> 
>>> Thanks,
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>> <mailto:j...@linkedin.com.INVALID>> wrote:
>>>> 
>>>> Is there anything wrong with brokers around that time? E.g. Broker restart?
>>>> The log you pasted are actually from replica fetchers. Could you paste the
>>>> related logs in controller.log?
>>>> 
>>>> Thanks.
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>>> On 3/9/15, 10:32 AM, "Zakee" <kzak...@netzero.net 
>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>> 
>>>>> Correction: Actually  the rebalance happened quite until 24 hours after
>>>>> the start, and thats where below errors were found. Ideally rebalance
>>>>> should not have happened at all.
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net 
>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>> 
>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>>> here?
>>>>>> Thanks for you suggestions.
>>>>>> It looks like the rebalance actually happened only once soon after I
>>>>>> started with clean cluster and data was pushed, it didn’t happen again
>>>>>> so far, and I see the partitions leader counts on brokers did not change
>>>>>> since then. One of the brokers was constantly showing 0 for partition
>>>>>> leader count. Is that normal?
>>>>>> 
>>>>>> Also, I still see lots of below errors (~69k) going on in the logs
>>>>>> since the restart. Is there any other reason than rebalance for these
>>>>>> errors?
>>>>>> 
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>>> partition [Topic-11,7] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>>> partition [Topic-2,25] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>>> partition [Topic-2,21] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>>> partition [Topic-22,9] to broker 5:class
>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>> 
>>>>>>> Some other things to check are:
>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>>> confirm.
>>>>>> Yes 
>>>>>> 
>>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>>> does not exist?
>>>>>> ls /admin
>>>>>> [delete_topics]
>>>>>> ls /admin/preferred_replica_election
>>>>>> Node does not exist: /admin/preferred_replica_election
>>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> Zakee
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>>>>> <mailto:j...@linkedin.com.INVALID>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>>> here?
>>>>>>> Some other things to check are:
>>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>>> confirm.
>>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>>> does not exist?
>>>>>>> 
>>>>>>> Jiangjie (Becket) Qin
>>>>>>> 
>>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>>> 
>>>>>>>> I started with  clean cluster and started to push data. It still does
>>>>>>>> the
>>>>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>>>>> is
>>>>>>>> set to false.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Zakee
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>>>>>>> <mailto:j...@linkedin.com.INVALID>>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>>>>> bit
>>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>>>>> expected.
>>>>>>>>> 
>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>>>>> replica
>>>>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>>> 
>>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>>>>> still
>>>>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>>>>> will
>>>>>>>>>> not happen when this is set to false.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Zakee
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>>> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>>>>> Increasing
>>>>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>>>>> large
>>>>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>>>>> threads
>>>>>>>>>>> will
>>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>>>>> case,
>>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>>>>> election?
>>>>>>>>>>> 
>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>> 
>>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>>>> <mailto:kzak...@netzero.net>
>>>>>>>>>>> <mailto:kzak...@netzero.net <mailto:kzak...@netzero.net>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>>>>> Anythings
>>>>>>>>>>>> that
>>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>>> 
>>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>>>>> have
>>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Zakee
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>>> <j...@linkedin.com.invalid <mailto:j...@linkedin.com.invalid>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>>>>> long
>>>>>>>>>>>>> as
>>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>>>>> replicated
>>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>>>>> sure
>>>>>>>>>>>>> what
>>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> broker
>>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>>> Error
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>>> request
>>>>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>>>>> partition
>>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>>>>> st0
>>>>>>>>>>>>>> 3v
>>>>>>>>>>>>>> uc
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>>>>> Free
>>>>>>>>>>>>> WIFI
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m
>>>>>>>>>>>>>  
>>>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>>>>> p02
>>>>>>>>>>>>> du
>>>>>>>>>>>>> c
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Extended Stay America
>>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>>> guaranteed.
>>>>>>>>>>> 
>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d
>>>>>>>>>>>  
>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>>>>> uc
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>>>>> duc
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ____________________________________________________________
>>>>>>>>> The WORST exercise for aging
>>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>>>>> YOUNGER
>>>>>>>>> 
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d
>>>>>>>>>  
>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>>>> uc
>>>>>>> 
>>>>>>> 
>>>>>>> ____________________________________________________________
>>>>>>> Seabourn Luxury Cruises
>>>>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>>>>> Line!
>>>>>>> 
>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>>>>>>>  
>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> Discover Seabourn
>>>> A journey as beautiful as the destination, request a brochure today!
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc 
>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>>> 
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>> ____________________________________________________________
>>> Want to place your ad here?
>>> Advertise on United Online
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>> 
> ____________________________________________________________
> What's your flood risk?
> Find flood maps, interactive tools, FAQs, and agents in your area.
> http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc 
> <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>

Re: Broker Exceptions

Reply via email to