Thanks for the responses!

>Does that error occur continuously? Preferred leaders are the first replica
>in the assigned replica list. Could you list the topics and see the
>distribution of the first replica in all partitions?
The state we were in showed all topics led by a single broker.  The
preferred replica (first in the isr list) was fairly evenly distributed
among brokers, but as mentioned we could not move the leaders from the
single broker to whichever was preferred for that topic-partition even with
the replica election tool.

>Did you see all 3 brokers registered in ZK? From the error logs it seems
>some of the brokers did not successfully startup and hence cannot take any
>partitions.
We had actually been running the system for a while without problems.

One of the main culprits seems to be long GC pauses leading to Zookeeper
timeouts on different nodes.  After this all leaders snap to a single
broker.  At some point this became unrecoverable, and everything was stuck
on a single broker and couldn't be moved.  We did a full restart of all
brokers at once (doing so in a rolling fashion didn't seem to make a
difference) and eventually we were able to at least get the cluster to get
into a state where we could reassign leaders using that tool.  However,
after another big GC pause we would run into the same issue of all topics
gravitating to a single broker.

We're doing a couple things to try and avoid this scenario: move to Java 7
/ G1 garbage collection to see if we can avoid these costly pauses, and
ultimately upgrade to 0.8.1.1.

Regards,
Jon


On Fri, May 16, 2014 at 9:32 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Jon,
>
> Did you see all 3 brokers registered in ZK? From the error logs it seems
> some of the brokers did not successfully startup and hence cannot take any
> partitions.
>
> Guozhang
>
>
> On Wed, May 14, 2014 at 11:45 AM, Jon Bender <jonathan.ben...@gmail.com
> >wrote:
>
> > Hello,
> >
> > I have a 3-node cluster that has had a couple issues lately.  One thing
> I'm
> > trying to sort out is why the topic-partitions are all owned by a single
> > leader (when i list topics, the leader is assigned to the current
> > controller node, irrespective the preferred replica).
> >
> > I have tried to use the preferred replica election tool per:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-2.PreferredReplicaLeaderElectionTool
> >
> > But I don't see any change taking place.
> >
> > Only thing I see in the logs on this machine is:
> >
> > [2014-05-14 11:37:22,146]  69818066
> > [ZkClient-EventThread-21-my.server1:2181,my.server2:2181,my.server3:2181]
> > INFO  kafka.utils.ZkUtils$  - conflict in /controller data: {
> > "brokerid":1390348134, "timestamp":"1400026164986", "version":1 } stored
> > data: { "brokerid":1390348134, "timestamp":"1400026164512", "version":1 }
> > [2014-05-14 11:37:22,147]  69818067
> > [ZkClient-EventThread-21-my.server1:2181,my.server2:2181,my.server3:2181]
> > INFO  kafka.utils.ZkUtils$  - I wrote this conflicted ephemeral node [{
> > "brokerid":1390348134, "timestamp":"1400026164986", "version":1 }] at
> > /controller a while back in a different session, hence I will backoff for
> > this node to be deleted by Zookeeper and retry
> >
> > Kafka version is 0.8.0.
> >
> > Any suggestions on how to get this cluster to properly rebalance?
> >
> > Cheers,
> > Jon
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to