Hi Jay/Neha,
I just subscribed to the mailing list so I read your response but did not
receive your email so adding the context into this email thread.

"

Agree with Jay on staying away from pinning roles to brokers. This is
actually harder to operate and monitor.

Regarding the problems you mentioned-
1. Reducing the controller moves during rolling bounce is useful but really
something that should be handled by the tooling. The root cause is that
currently the controller move is expensive. I think we'd be better off
investing time and effort in thinning out the controller. Just moving to
the batch write APIs in ZooKeeper will make a huge difference.
2. I'm not sure I understood the motivation behind moving partitions out of
the controller broker. That seems like a proposal for a solution, but can
you describe the problems you saw that affected controller functionality?

Regarding the location of the controller, it seems there are 2 things you
are suggesting:
1. Optimizing the strategy of picking a broker as the controller (e.g.
least loaded node)
2. Moving the controller if a broker soft fails.

I don't think #1 is worth the effort involved. The better way of addressing
it is to make the controller thinner and faster. #2 is interesting since
the problem is that while a broker fails, all state changes fail or are
queued up which globally impacts the cluster. There are 2 alternatives -
have a tool that allows you to move the controller or just kill the broker
so the controller moves. I prefer the latter since it is simple and also
because a misbehaving broker is better off shutdown anyway.

Having said that, it will be helpful to know details of the problems you
saw while operating the controller. I think understanding those will help
guide the solution better.

On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote:

> This seems like a step backwards--we really don't want people to manually
> manage the location of the controller and try to manually balance
> partitions off that broker.
>
> I think it might make sense to consider directly fixing the things you
> actual want to fix:
> 1. Two many controller moves--we could either just make this cheaper or
> make the controller location more deterministic e.g. having the election
> prefer the node with the smallest node id so there were fewer failovers in
> rolling bounces.
> 2. You seem to think having the controller on a normal node is a problem.
> Can you elaborate on what the negative consequences you've observed? Let's
> focus on fixing those.
>
> In general we've worked very hard to avoid having a bunch of dedicated
> roles for different nodes and I would be very very loath to see us move
> away from that philosophy. I have a fair amount of experience with both
> homogenous systems that have a single role and also systems with many
> differentiated roles and I really think that the differentiated approach
> causes more problems than it solves for most deployments due to the added
> complexity.
>
> I think we could also fix up this KIP a bit. For example it says there are
> no public interfaces involved but surely there are new admin commands to
> control the location? There are also some minor things like listing it as
> released in 0.8.3 that seem wrong.
>
> -Jay
>
> On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
> ani...@linkedin.com.invalid> wrote:
>
> > Hi,
> > Can we please discuss this KIP. The background for this is that it allows
> > us to pin controller to a broker. This is useful in a couple of
> scenarios:
> > a) If we want to do a rolling bounce we can reduce the number of
> controller
> > moves down to 1.
> > b) Again pick a designated broker and reduce the number of partitions on
> it
> > through admin reassign partitions and designate it as a controller.
> > c) Dynamically move controller if we see any problems on the broker which
> > it is running.
> >
> > Here is the wiki page
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
> >
> > -Abhishek
> >
>

"

I think based on the feedback we can limit the discussion to the rolling
upgrade scenario and how best to address it. I think the only scenario
which I have heard
where we wanted to move controller off a broker was due to a bug where we
had multiple controllers due to a bug which has since been fixed.

I will update the KIP on how we can optimize the placement of controller
(pinning it to a preferred broker id (potentially config enabled) ) if that
sounds reasonable.
Many of the ideas of the original KIP can still apply in the limited scope.

-Abhishek

Reply via email to