Agree with Jay on staying away from pinning roles to brokers. This is actually harder to operate and monitor.
Regarding the problems you mentioned- 1. Reducing the controller moves during rolling bounce is useful but really something that should be handled by the tooling. The root cause is that currently the controller move is expensive. I think we'd be better off investing time and effort in thinning out the controller. Just moving to the batch write APIs in ZooKeeper will make a huge difference. 2. I'm not sure I understood the motivation behind moving partitions out of the controller broker. That seems like a proposal for a solution, but can you describe the problems you saw that affected controller functionality? Regarding the location of the controller, it seems there are 2 things you are suggesting: 1. Optimizing the strategy of picking a broker as the controller (e.g. least loaded node) 2. Moving the controller if a broker soft fails. I don't think #1 is worth the effort involved. The better way of addressing it is to make the controller thinner and faster. #2 is interesting since the problem is that while a broker fails, all state changes fail or are queued up which globally impacts the cluster. There are 2 alternatives - have a tool that allows you to move the controller or just kill the broker so the controller moves. I prefer the latter since it is simple and also because a misbehaving broker is better off shutdown anyway. Having said that, it will be helpful to know details of the problems you saw while operating the controller. I think understanding those will help guide the solution better. On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote: > This seems like a step backwards--we really don't want people to manually > manage the location of the controller and try to manually balance > partitions off that broker. > > I think it might make sense to consider directly fixing the things you > actual want to fix: > 1. Two many controller moves--we could either just make this cheaper or > make the controller location more deterministic e.g. having the election > prefer the node with the smallest node id so there were fewer failovers in > rolling bounces. > 2. You seem to think having the controller on a normal node is a problem. > Can you elaborate on what the negative consequences you've observed? Let's > focus on fixing those. > > In general we've worked very hard to avoid having a bunch of dedicated > roles for different nodes and I would be very very loath to see us move > away from that philosophy. I have a fair amount of experience with both > homogenous systems that have a single role and also systems with many > differentiated roles and I really think that the differentiated approach > causes more problems than it solves for most deployments due to the added > complexity. > > I think we could also fix up this KIP a bit. For example it says there are > no public interfaces involved but surely there are new admin commands to > control the location? There are also some minor things like listing it as > released in 0.8.3 that seem wrong. > > -Jay > > On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam < > ani...@linkedin.com.invalid> wrote: > > > Hi, > > Can we please discuss this KIP. The background for this is that it allows > > us to pin controller to a broker. This is useful in a couple of > scenarios: > > a) If we want to do a rolling bounce we can reduce the number of > controller > > moves down to 1. > > b) Again pick a designated broker and reduce the number of partitions on > it > > through admin reassign partitions and designate it as a controller. > > c) Dynamically move controller if we see any problems on the broker which > > it is running. > > > > Here is the wiki page > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker > > > > -Abhishek > > > -- Thanks, Neha