Re: [DISCUSS] KIP-183 - Change PreferredReplicaLeaderElectionCommand to use AdminClient

Tom Bentley Wed, 09 Aug 2017 03:40:43 -0700

Hi Ewen,

Thanks for looking at the KIP. I've updated it for some of your comments,
but see also a few replies inline.

On 9 August 2017 at 06:02, Ewen Cheslack-Postava <e...@confluent.io> wrote:

> Thanks for the KIP. Generally the move away from ZK and to native Kafka
> requests is good, so I'm generally +1 on this. A couple of
> comments/questions.
>
> * You gave the signature
> electPreferredReplicaLeader(Collection<TopicPartition> partitions) for the
> admin client. The old command allows not specifying the topic partitions
> which results in election for all topic partitions. How would this be
> expressed in the new API? Does the tool need to do a metadata request and
> then issue the request for all topic partitions or would there be a
> shortcut via the protocol (e.g. another electPreferredReplicaLeader() with
> no args which translates to, e.g., an empty list or null in the request)?
>

I was trying to figure out the best way to do this. It seems there are no
methods in the AdminClient setting a precedent on this. I would prefer to
avoid having to do a metadata request to discover all the partitions. So
the two ways I can think of are:

1. As you suggest, an empty or null list in the request would signify "all
partitions". This is not intuitive, and using an empty list is downright
surprising (an empty list should be a noop after all).
2. We could have an allPartitions flag in the Options class. This is also
unintuitive because it would ignore whatever partitions were in the list,
though I suppose the AdminClient itself could check for this and signal an
error.

Of these options, I think using a null list in the request is the least
surprising.

> * All the existing *Options classes for the AdminClient have at least a
> timeout. Should we add that to this APIs options?
> * Other AdminClient *Result classes provide some convenience methods when
> you just want all the results. Should we have that as well?
> * I don't think any other requests include error strings in the response,
> do they? I'm pretty sure we just generate the error string on the client
> side based on the error code. If an error code was ambiguous, we should
> probably just split it across 2 or more codes.
>

There are responses with detailed error messages as well as the codes,
CreateTopicsResponse, {Describe|Alter}ConfigsResponse, and the responses
for managing ACLs for instance. To be honest, I assumed including a message
was the norm. In this case, however, I don't think there is any extra
detail to include beyond the error itself, so I've removed it from the KIP.

> * re: permissions, would there also be a per-topic permission required? (I
> haven't kept track of all the ACLs and required permissions so I'm not
> sure, but a bunch of other operations check per-resource they are modifying
> -- this operation could be considered to be against either Cluster or
> Topics.)
>

I'm also unsure, and would welcome input from others about this. Note that
there is some similarity here with some of the discussion on
https://issues.apache.org/jira/browse/KAFKA-5638: To what extent should
some permission on the Cluster imply some permission across all Topics in
the cluster? My feeling is that there should be no such implication,
because it will just make ACLs harder to reason about, and require more
documentation about what implies what else.

Getting back to this specific case, it would make some sense for Alter on
the Topic to be necessary, though it could be argued that the leader of a
partition is an operational concern, and not really part of the topic
configuration. Changing the leader of a bunch of partitions could affect
partitions of other topics. On the other hand, anyone with Alter on the
topic would presumably have the freedom to set the preferred leader for a
partition, so it would be a little strange if they couldn't actually give
effect to that choice.

On balance I think I prefer needing alter on the topic (and have adjusted
the KIP accordingly), but I'm very happy to go with the consensus view.

> * Should we consider making the request only valid against the broker and
> having the command do a metadata request to discover the controller and
> then send the request directly to it? I think this could a) simplify the
> implementation a bit by taking the ZK node out of the equation b) avoid the
> REPLICA_LEADER_ELECTION_IN_PROGRESS error since this is only required due
> to using the ZK node to communicate with the controller afaict and c)
> potentially open the door for a synchronous request/response in the future.
>
>
That's a really good point. I agree that the request should be made to the
controller, because it allows us to make improvements in future. I'm not
totally sure about taking ZK out of the equation completely though: Right
now during controller failover the new controller checks the znode and
ensures those elections get honoured. So it seems we need to keep ZK in the
picture, and without rewriting how we read and write to that ZK node to
support treating it more like a queue, it seems simplest to just give up
when the node is already present. Also, such a rewrite would be simpler
once the support for --zookeeper was removed from the tool, since then it
would only be the controller that was changing it, avoiding the possibility
of races.

For now I've kept this error code in the KIP, but happy to revise that if
this interpretation is wrong.

> -Ewen
>
> On Mon, Aug 7, 2017 at 8:21 AM, Tom Bentley <t.j.bent...@gmail.com> wrote:
>
> > Hi,
> >
> > I've updated this KIP slightly, to clarify a couple of points.
> >
> > One thing in particular that I would like feedback on is what
> authorization
> > should be required for triggering the election of the preferred leader?
> >
> > Another thing would be whether the request and responses should be
> grouped
> > by topic name. This would make for smaller messages when triggering
> > elections for multiple partitions of the same topic.
> >
> > I'd be grateful for any feedback you may have.
> >
> > Cheers,
> >
> > Tom
> >
> > On 2 August 2017 at 18:34, Tom Bentley <t.j.bent...@gmail.com> wrote:
> >
> > > In a similar vein to KIP-179 I've created KIP-183 (
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-183+-+Change+
> > > PreferredReplicaLeaderElectionCommand+to+use+AdminClient) which is
> about
> > > deprecating the --zookeeper option to kafka-preferred-replica-
> > election.sh
> > > and replacing it with an option which would use a new AdminClient-based
> > API.
> > >
> > > As it stands the KIP is focussed on simply moving the existing
> > > functionality behind the AdminClient.
> > >
> > > I'd be grateful for any feedback people may have on this.
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> >
>

Re: [DISCUSS] KIP-183 - Change PreferredReplicaLeaderElectionCommand to use AdminClient

Reply via email to