Hi Ewen, Thanks for looking at the KIP. I've updated it for some of your comments, but see also a few replies inline.
On 9 August 2017 at 06:02, Ewen Cheslack-Postava <e...@confluent.io> wrote: > Thanks for the KIP. Generally the move away from ZK and to native Kafka > requests is good, so I'm generally +1 on this. A couple of > comments/questions. > > * You gave the signature > electPreferredReplicaLeader(Collection<TopicPartition> partitions) for the > admin client. The old command allows not specifying the topic partitions > which results in election for all topic partitions. How would this be > expressed in the new API? Does the tool need to do a metadata request and > then issue the request for all topic partitions or would there be a > shortcut via the protocol (e.g. another electPreferredReplicaLeader() with > no args which translates to, e.g., an empty list or null in the request)? > I was trying to figure out the best way to do this. It seems there are no methods in the AdminClient setting a precedent on this. I would prefer to avoid having to do a metadata request to discover all the partitions. So the two ways I can think of are: 1. As you suggest, an empty or null list in the request would signify "all partitions". This is not intuitive, and using an empty list is downright surprising (an empty list should be a noop after all). 2. We could have an allPartitions flag in the Options class. This is also unintuitive because it would ignore whatever partitions were in the list, though I suppose the AdminClient itself could check for this and signal an error. Of these options, I think using a null list in the request is the least surprising. > * All the existing *Options classes for the AdminClient have at least a > timeout. Should we add that to this APIs options? > * Other AdminClient *Result classes provide some convenience methods when > you just want all the results. Should we have that as well? > * I don't think any other requests include error strings in the response, > do they? I'm pretty sure we just generate the error string on the client > side based on the error code. If an error code was ambiguous, we should > probably just split it across 2 or more codes. > There are responses with detailed error messages as well as the codes, CreateTopicsResponse, {Describe|Alter}ConfigsResponse, and the responses for managing ACLs for instance. To be honest, I assumed including a message was the norm. In this case, however, I don't think there is any extra detail to include beyond the error itself, so I've removed it from the KIP. > * re: permissions, would there also be a per-topic permission required? (I > haven't kept track of all the ACLs and required permissions so I'm not > sure, but a bunch of other operations check per-resource they are modifying > -- this operation could be considered to be against either Cluster or > Topics.) > I'm also unsure, and would welcome input from others about this. Note that there is some similarity here with some of the discussion on https://issues.apache.org/jira/browse/KAFKA-5638: To what extent should some permission on the Cluster imply some permission across all Topics in the cluster? My feeling is that there should be no such implication, because it will just make ACLs harder to reason about, and require more documentation about what implies what else. Getting back to this specific case, it would make some sense for Alter on the Topic to be necessary, though it could be argued that the leader of a partition is an operational concern, and not really part of the topic configuration. Changing the leader of a bunch of partitions could affect partitions of other topics. On the other hand, anyone with Alter on the topic would presumably have the freedom to set the preferred leader for a partition, so it would be a little strange if they couldn't actually give effect to that choice. On balance I think I prefer needing alter on the topic (and have adjusted the KIP accordingly), but I'm very happy to go with the consensus view. > * Should we consider making the request only valid against the broker and > having the command do a metadata request to discover the controller and > then send the request directly to it? I think this could a) simplify the > implementation a bit by taking the ZK node out of the equation b) avoid the > REPLICA_LEADER_ELECTION_IN_PROGRESS error since this is only required due > to using the ZK node to communicate with the controller afaict and c) > potentially open the door for a synchronous request/response in the future. > > That's a really good point. I agree that the request should be made to the controller, because it allows us to make improvements in future. I'm not totally sure about taking ZK out of the equation completely though: Right now during controller failover the new controller checks the znode and ensures those elections get honoured. So it seems we need to keep ZK in the picture, and without rewriting how we read and write to that ZK node to support treating it more like a queue, it seems simplest to just give up when the node is already present. Also, such a rewrite would be simpler once the support for --zookeeper was removed from the tool, since then it would only be the controller that was changing it, avoiding the possibility of races. For now I've kept this error code in the KIP, but happy to revise that if this interpretation is wrong. > -Ewen > > On Mon, Aug 7, 2017 at 8:21 AM, Tom Bentley <t.j.bent...@gmail.com> wrote: > > > Hi, > > > > I've updated this KIP slightly, to clarify a couple of points. > > > > One thing in particular that I would like feedback on is what > authorization > > should be required for triggering the election of the preferred leader? > > > > Another thing would be whether the request and responses should be > grouped > > by topic name. This would make for smaller messages when triggering > > elections for multiple partitions of the same topic. > > > > I'd be grateful for any feedback you may have. > > > > Cheers, > > > > Tom > > > > On 2 August 2017 at 18:34, Tom Bentley <t.j.bent...@gmail.com> wrote: > > > > > In a similar vein to KIP-179 I've created KIP-183 ( > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-183+-+Change+ > > > PreferredReplicaLeaderElectionCommand+to+use+AdminClient) which is > about > > > deprecating the --zookeeper option to kafka-preferred-replica- > > election.sh > > > and replacing it with an option which would use a new AdminClient-based > > API. > > > > > > As it stands the KIP is focussed on simply moving the existing > > > functionality behind the AdminClient. > > > > > > I'd be grateful for any feedback people may have on this. > > > > > > Thanks, > > > > > > Tom > > > > > >