Hey Tom, Thanks for the quick reply. Please see my comment inline.
On Tue, Aug 8, 2017 at 11:06 AM, Tom Bentley <t.j.bent...@gmail.com> wrote: > Hi Dong, > > Replies inline, as usual > > > As I originally envisaged it, KIP-179's support for reassigning > partitions > > > > would have more-or-less taken the logic currently in the > > > ReassignPartitionsCommand (that is, writing JSON to the > > > ZkUtils.ReassignPartitionsPath) > > > and put it behind a suitable network protocol API. Thus it wouldn't > > matter > > > which broker received the protocol call: It would be acted on by > brokers > > > being notified of the change in the ZK path, just as currently. This > > would > > > have kept the ReassignPartitionsCommand relatively simple, as it > > currently > > > is. > > > > > > > I am not sure I fully understand your proposal. I think you are saying > that > > any broker can receive and handle the AlterTopicRequest. > > > That's right. > > > > Let's say a > > non-controller broker received AlterTopicRequest, is this broker going to > > send LeaderAndIsrRequest to other brokers? Or is this broker create the > > reassignment znode in zookeper? > > > Exactly: It's going to write some JSON to the relevant znode. Other brokers > will get notified by zk when the contents of this znode changes, and do as > they do now. This is what the tool/script does now. > > I will confess that I don't completely understand the role of > LeaderAndIsrRequest, since the current code just seems to write to the > znode do get the brokers to do the reassignment. If you could explain the > role of LeaderAndIsrRequest that would be great. > Currently only the controller will listen to the reassignment znode and sends LeaderAndIsrRequest and StopReplicaRequest to brokers in order to complete reassignment. Brokers won't need to listen to zookeeper for any reassignment -- brokers only reacts to the request from controller. Currently Kafka's design replies a lot on the controller to keep a consistent view of who are the leader of partitions and what is the ISR etc. It will be a pretty drastic change, if not impossible, for the script to reassign partitions without going through controller. Thus I think it is likely that your AlterTopicsRequest can only be sent to controller. Then the controller can create the reassignment znode in zookeeper so that the information is persisted across controller fail over. I haven't think through this in detail though. > > > > I may have missed it. But I couldn't find > > the explanation of AlterTopicRequest handling in KIP-179. > > > > You're right, it doesn't go into that much detail. I will fix that. > > > > > > > > KIP-113 is obviously seeking to make more radical changes. The > algorithm > > > described for moving a replica to a particular directory on a different > > > broker ( > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 113%3A+Support+replicas+movement+between+log+directories#KIP-113: > > > Supportreplicasmovementbetweenlogdirectories-2) > > > Howtoreassignreplicabetweenlogdirectoriesacrossbrokers > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 113%3A+Support+replicas+movement+between+log+directories#KIP-113: > > > Supportreplicasmovementbetweenlogdirectories-2% > > > 29Howtoreassignreplicabetweenlogdirectoriesacrossbrokers>) > > > involves both sending AlterReplicaDirRequest to "the" broker (the > > receiving > > > broker, I assume, but it's not spelled out), _as well as_ writing to > the > > ZK > > > node. > > > > > > This assumes the script (ReassignPartitionsCommand) has direct access > to > > > ZooKeeper, which is what KIP-179 is seeking to deprecate. It seems a > > waste > > > of time to put the logic in the script as part of KIP-113, only for > > KIP-179 > > > to have to move it back to the controller. > > > > > > > I am not sure I understand what you mean by "It seems a waste of time to > > put the logic in the script as part of KIP-113, only for KIP-179 to have > to > > move it back to the controller". > > > Sorry, I misunderstood slightly what you were proposing in KIP-113, so the > "waste of time" comment isn't quite right, but I'm still not convinced that > KIP-113+KIP-179 (in its current form) ends with a satisfactory result. > > Let me elaborate... KIP-113 says that to support reassigning replica > between log directories across brokers: > * ... > * The script sends AlterReplicaDirRequest to those brokers which need to > move replicas... > * The script creates reassignment znode in zookeeper. > * The script retries AlterReplicaDirRequest to those broker... > * ... > > So the ReassignPartitionsCommand still talks to ZK directly, but now it's > bracketed by calls to the AdminClient. KIP-179 could replace that talking > to ZK directly with a new call to the AdminClient. But then we've got a > pretty weird API, where we have to make three AdminClient calls (two of > them to the same method), to move a replica. I don't really understand why > the admin client can't present a single API method to achieve this, and > encapsulate on the server side the careful sequence of events necessary to > coordinate the movement. I understood this position is what Ismael was > advocating when he said it was better to put the logic in the controller > than spread between the script and the controller. But maybe I > misunderstood him. > I have some concern with putting this logic in controller which can be found in my previous email. Before that is addressed, the script (or AdminClient) seems to be the simplest place to have this logic. I agree it is better to have a single API to achieve both partition and replica -> dir assignment. I think it is likely that we will find a good API to do both. I have updated the KIP-113 to remove API alterReplicaDir from AdminClient interface. > > > > I assume that the logic you mentioned is > > "movement of replica to the specified log directory". This logic (or the > > implementation of this logic) resides mainly in the KafkaAdminClient and > > broker. The script only needs to parse the json file as appropriate and > > call the new API in AdminClient as appropriate. The logic in the script > is > > therefore not much and can be easily moved to other classes if needed. > > > > Can you clarify why this logic, i.e. movement of replica to the specified > > log directory, needs to be moved to controller in KIP-179? I think it can > > still be done in the script and controller should not need to worry about > > log directory of any replica. > > > > Thanks, > > Dong > > >