Hi Abhishek -

Perhaps it would help if you explained the motivation behind your proposal.
I know there was a bunch of discussion on KAFKA-1778, can you summarize?
Currently, I'd agree with Neha and Jay that there isn't really a strong
reason to pin the controller to a given broker or restricted to a set of
brokers.

For rolling upgrades, it should be simpler to bounce the existing
controller last.
As for choosing a relatively lightly loaded broker, I think we should
ideally eliminate those by distributing partitions (and data rate) as
evenly as possible. If for some reason a broker cannot become the
controller, (by virtue of load or something else) arguably that is a
separate problem that needs addressing.

Thanks,
Aditya

On Tue, Oct 20, 2015 at 9:27 PM, Neha Narkhede <n...@confluent.io> wrote:

> >
> > I will update the KIP on how we can optimize the placement of controller
> > (pinning it to a preferred broker id (potentially config enabled) ) if
> that
> > sounds reasonable.
>
>
> The point I (and I think Jay too) was making is that pinning a controller
> to a broker through config is what we should stay away from. This should be
> handled by whatever tool you may be using to bounce the cluster in a
> rolling restart fashion (where it detects the current controller and
> restarts it at the very end).
>
>
> On Tue, Oct 20, 2015 at 5:35 PM, Abhishek Nigam
> <ani...@linkedin.com.invalid
> > wrote:
>
> > Hi Jay/Neha,
> > I just subscribed to the mailing list so I read your response but did not
> > receive your email so adding the context into this email thread.
> >
> > "
> >
> > Agree with Jay on staying away from pinning roles to brokers. This is
> > actually harder to operate and monitor.
> >
> > Regarding the problems you mentioned-
> > 1. Reducing the controller moves during rolling bounce is useful but
> really
> > something that should be handled by the tooling. The root cause is that
> > currently the controller move is expensive. I think we'd be better off
> > investing time and effort in thinning out the controller. Just moving to
> > the batch write APIs in ZooKeeper will make a huge difference.
> > 2. I'm not sure I understood the motivation behind moving partitions out
> of
> > the controller broker. That seems like a proposal for a solution, but can
> > you describe the problems you saw that affected controller functionality?
> >
> > Regarding the location of the controller, it seems there are 2 things you
> > are suggesting:
> > 1. Optimizing the strategy of picking a broker as the controller (e.g.
> > least loaded node)
> > 2. Moving the controller if a broker soft fails.
> >
> > I don't think #1 is worth the effort involved. The better way of
> addressing
> > it is to make the controller thinner and faster. #2 is interesting since
> > the problem is that while a broker fails, all state changes fail or are
> > queued up which globally impacts the cluster. There are 2 alternatives -
> > have a tool that allows you to move the controller or just kill the
> broker
> > so the controller moves. I prefer the latter since it is simple and also
> > because a misbehaving broker is better off shutdown anyway.
> >
> > Having said that, it will be helpful to know details of the problems you
> > saw while operating the controller. I think understanding those will help
> > guide the solution better.
> >
> > On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote:
> >
> > > This seems like a step backwards--we really don't want people to
> manually
> > > manage the location of the controller and try to manually balance
> > > partitions off that broker.
> > >
> > > I think it might make sense to consider directly fixing the things you
> > > actual want to fix:
> > > 1. Two many controller moves--we could either just make this cheaper or
> > > make the controller location more deterministic e.g. having the
> election
> > > prefer the node with the smallest node id so there were fewer failovers
> > in
> > > rolling bounces.
> > > 2. You seem to think having the controller on a normal node is a
> problem.
> > > Can you elaborate on what the negative consequences you've observed?
> > Let's
> > > focus on fixing those.
> > >
> > > In general we've worked very hard to avoid having a bunch of dedicated
> > > roles for different nodes and I would be very very loath to see us move
> > > away from that philosophy. I have a fair amount of experience with both
> > > homogenous systems that have a single role and also systems with many
> > > differentiated roles and I really think that the differentiated
> approach
> > > causes more problems than it solves for most deployments due to the
> added
> > > complexity.
> > >
> > > I think we could also fix up this KIP a bit. For example it says there
> > are
> > > no public interfaces involved but surely there are new admin commands
> to
> > > control the location? There are also some minor things like listing it
> as
> > > released in 0.8.3 that seem wrong.
> > >
> > > -Jay
> > >
> > > On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
> > > ani...@linkedin.com.invalid> wrote:
> > >
> > > > Hi,
> > > > Can we please discuss this KIP. The background for this is that it
> > allows
> > > > us to pin controller to a broker. This is useful in a couple of
> > > scenarios:
> > > > a) If we want to do a rolling bounce we can reduce the number of
> > > controller
> > > > moves down to 1.
> > > > b) Again pick a designated broker and reduce the number of partitions
> > on
> > > it
> > > > through admin reassign partitions and designate it as a controller.
> > > > c) Dynamically move controller if we see any problems on the broker
> > which
> > > > it is running.
> > > >
> > > > Here is the wiki page
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
> > > >
> > > > -Abhishek
> > > >
> > >
> >
> > "
> >
> > I think based on the feedback we can limit the discussion to the rolling
> > upgrade scenario and how best to address it. I think the only scenario
> > which I have heard
> > where we wanted to move controller off a broker was due to a bug where we
> > had multiple controllers due to a bug which has since been fixed.
> >
> > I will update the KIP on how we can optimize the placement of controller
> > (pinning it to a preferred broker id (potentially config enabled) ) if
> that
> > sounds reasonable.
> > Many of the ideas of the original KIP can still apply in the limited
> scope.
> >
> > -Abhishek
> >
>
>
>
> --
> Thanks,
> Neha
>

Reply via email to