Thanks for the responses! >Does that error occur continuously? Preferred leaders are the first replica >in the assigned replica list. Could you list the topics and see the >distribution of the first replica in all partitions? The state we were in showed all topics led by a single broker. The preferred replica (first in the isr list) was fairly evenly distributed among brokers, but as mentioned we could not move the leaders from the single broker to whichever was preferred for that topic-partition even with the replica election tool.
>Did you see all 3 brokers registered in ZK? From the error logs it seems >some of the brokers did not successfully startup and hence cannot take any >partitions. We had actually been running the system for a while without problems. One of the main culprits seems to be long GC pauses leading to Zookeeper timeouts on different nodes. After this all leaders snap to a single broker. At some point this became unrecoverable, and everything was stuck on a single broker and couldn't be moved. We did a full restart of all brokers at once (doing so in a rolling fashion didn't seem to make a difference) and eventually we were able to at least get the cluster to get into a state where we could reassign leaders using that tool. However, after another big GC pause we would run into the same issue of all topics gravitating to a single broker. We're doing a couple things to try and avoid this scenario: move to Java 7 / G1 garbage collection to see if we can avoid these costly pauses, and ultimately upgrade to 0.8.1.1. Regards, Jon On Fri, May 16, 2014 at 9:32 AM, Guozhang Wang <wangg...@gmail.com> wrote: > Hello Jon, > > Did you see all 3 brokers registered in ZK? From the error logs it seems > some of the brokers did not successfully startup and hence cannot take any > partitions. > > Guozhang > > > On Wed, May 14, 2014 at 11:45 AM, Jon Bender <jonathan.ben...@gmail.com > >wrote: > > > Hello, > > > > I have a 3-node cluster that has had a couple issues lately. One thing > I'm > > trying to sort out is why the topic-partitions are all owned by a single > > leader (when i list topics, the leader is assigned to the current > > controller node, irrespective the preferred replica). > > > > I have tried to use the preferred replica election tool per: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.PreferredReplicaLeaderElectionTool > > > > But I don't see any change taking place. > > > > Only thing I see in the logs on this machine is: > > > > [2014-05-14 11:37:22,146] 69818066 > > [ZkClient-EventThread-21-my.server1:2181,my.server2:2181,my.server3:2181] > > INFO kafka.utils.ZkUtils$ - conflict in /controller data: { > > "brokerid":1390348134, "timestamp":"1400026164986", "version":1 } stored > > data: { "brokerid":1390348134, "timestamp":"1400026164512", "version":1 } > > [2014-05-14 11:37:22,147] 69818067 > > [ZkClient-EventThread-21-my.server1:2181,my.server2:2181,my.server3:2181] > > INFO kafka.utils.ZkUtils$ - I wrote this conflicted ephemeral node [{ > > "brokerid":1390348134, "timestamp":"1400026164986", "version":1 }] at > > /controller a while back in a different session, hence I will backoff for > > this node to be deleted by Zookeeper and retry > > > > Kafka version is 0.8.0. > > > > Any suggestions on how to get this cluster to properly rebalance? > > > > Cheers, > > Jon > > > > > > -- > -- Guozhang >