Re: Producer fails when old brokers are replaced by new

Guozhang Wang Wed, 26 Feb 2014 18:17:25 -0800

kafka-preferred-replica-election.sh is only used to move leaders between
brokers, as long as the broker in the broker.metadata.list, i.e. the second
broker list I mentioned in previous email is still alive then the producer
can learn the leader change from it.


In terms of broker discovery, I think it depends on how you "define" the
future. For example, originally there are 3 brokers 1,2,3, and you start
the producer with metadata list = {1,2,3}, and later on another three
brokers 4,5,6 are added, the producer can still find these newly added
brokers. It is just that if all the brokers in the metadata list, i.e.
1,2,3 are gone, then the producer will not be able to refresh its metadata.

Guozhang


On Wed, Feb 26, 2014 at 11:04 AM, Christofer Hedbrandh <
christo...@knewton.com> wrote:

> Thanks for your response Guozhang.
>
> I did make sure that new meta data is fetched before taking out the old
> broker. I set the topic.metadata.refresh.interval.ms to something very
> low,
> and I confirm in the producer log that new meta data is actually fetched,
> after the new broker is brought up, and before the old broker is taken
> down. Does this not mean that the dynamic current brokers list would hold
> the new broker at this point?
>
> If you are saying that the dynamic current brokers list is never used for
> fetching meta data, this does not explain how the producer does NOT fail
> when kafka-preferred-replica-election.sh makes the new broker become the
> leader.
>
> Lastly, if broker discovery is not a producer feature in 0.8.0 Release, and
> I have to "make sure at least one broker in the list is alive during the
> rolling bounce", is this a feature you are considering for future versions?
>
>
>
> On Wed, Feb 26, 2014 at 12:04 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
>
> > Hello Chris,
> >
> > The broker.metadata.list, once read in at start up time, will not be
> > changed. In other words, during the life time of a producer it has two
> > lists of brokers:
> >
> > 1. The current brokers in the cluster that is returned in the metadata
> > request response, which is dynamic
> >
> > 2. The broker list that is used for bootstraping, this is read from
> > broker.metadata.list and is fixed. This list could for example be a VIP
> and
> > a hardware load balancer behind it will distribute the metadata requests
> to
> > the brokers.
> >
> > So in your case, the metadata list only has broker B, and once it is
> taken
> > out and the producer failed to send message to it and hence tries to
> > refresh its metadata, it has no broker to go.
> >
> > Therefore, when you are trying to do a rolling bounce of the cluster to,
> > for example, do a in-place upgrade, you need to make sure at least one
> > broker in the list is alive during the rolling bounce.
> >
> > Hope this helps.
> >
> > Guozhang
> >
> >
> >
> >
> >
> > On Wed, Feb 26, 2014 at 8:19 AM, Christofer Hedbrandh <
> > christo...@knewton.com> wrote:
> >
> > > Hi all,
> > >
> > > I ran into a problem with the Kafka producer when attempting to replace
> > all
> > > the nodes in a 0.8.0 Beta1 Release Kafka cluster, with 0.8.0 Release
> > nodes.
> > > I started a producer/consumer test program to measure the clusters
> > > performance during the process, I added new brokers, I ran
> > > kafka-reassign-partitions.sh, and I removed the old brokers. When I
> > removed
> > > the old brokers my producer failed.
> > >
> > > The simplest scenario that I could come up with where I still see this
> > > behavior is this. Using version 0.8.0 Release, we have a 1 partition
> > topic
> > > with 2 replicas on 2 brokers, broker A and broker B. Broker A is taken
> > > down. A producer is started with only broker B in the
> > metadata.broker.list.
> > > Broker A is brought back up. We let
> > > topic.metadata.refresh.interval.msamount of time pass. Broker B is
> > > taken down, and we get
> > > kafka.common.FailedToSendMessageException after all the (many) retries
> > have
> > > failed.
> > >
> > > During my experimentation I have made sure that the producer fetches
> meta
> > > data before the old broker is taken down. And I have made sure that
> > enough
> > > retries with enough backoff time were used for the producer to not give
> > up
> > > prematurely.
> > >
> > > The documentation for the producer config metadata.broker.list suggests
> > to
> > > me that this list of brokers is only used at startup. "This is for
> > > bootstrapping and the producer will only use it for getting metadata
> > > (topics, partitions and replicas)". And when I read about
> > > topic.metadata.refresh.interval.ms and retry.backoff.ms I learn that
> > meta
> > > data is indeed fetched at later times. Based on this documentation, I
> > make
> > > the assumption that the producer would learn about any new brokers when
> > new
> > > meta data is fetched.
> > >
> > > I also want to point out that the cluster seems to work just fine
> during
> > > this process, it only seems to be a problem with the producer. Between
> > all
> > > these steps I run kafka-list-topic.sh, I try the console producer and
> > > consumer, and everything is as expected.
> > >
> > > Also I found another interesting thing when experimenting with running
> > > kafka-preferred-replica-election.sh before taking down the old broker.
> > This
> > > script only causes any changes when the leader and the preferred
> replica
> > > are different. In the scenario when they are in fact different, and the
> > new
> > > broker takes the role of leader from the old broker, the producer does
> > NOT
> > > fail. This makes me think that perhaps the producer only keeps meta
> data
> > > about topic leaders and not all replicas, as the documentation suggests
> > to
> > > me.
> > >
> > > It is clear that I am making a lot of assumptions here, and I am
> > relatively
> > > new to Kafka so I could very well me missing something important. The
> > way I
> > > see it, there are a few possibilities.
> > >
> > > 1. Broker discovery is a supposed producer feature, and it has a bug.
> > > 2. Broker discovery is not a producer feature, in which case I think
> many
> > > people might benefit from a clearer documentation.
> > > 3. I am doing something dumb e.g. forgetting about some important
> > > configuration.
> > >
> > > Please let me know what you make of this.
> > >
> > > Thanks,
> > > Christofer Hedbrandh
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: Producer fails when old brokers are replaced by new

Reply via email to