Hi Colin,

Thanks again for checking this out.

Indeed you are right, a configuration problem is what leads to
authorization failure (and consequently to the internal topics bug): i.e.
incorrect ACLs configuration. In particular, in case of insufficient
cluster-level ACLs, so if one does not include the broker CN required to
allow inter-broker communication when client SSL is required:
1) FindCoordinator request completes successfully, and __consumer_offsets
topic is created in zk
2) but subsequent UpdateMetadata and LeaderAndIsr fail. This leaves the
internal topic in a bad state

A deeper look confirmed that the change I proposed initially does not work,
since authorizing the user principal is not enough to prevent the issue.
However, I believe that we should still avoid creating the internal
topic(s) at all in case of insufficient broker ACLs (which means, make
FindCoordinator request fail since we won't have the required metadata). A
possibility could be to try to check the existence of brokers' ACLs before
creating the internal topic.
Let me know if you have any feedback.

Thanks,
Paolo


On Tue, 7 Apr 2020 at 17:12, Colin McCabe <cmcc...@apache.org> wrote:

> On Tue, Apr 7, 2020, at 08:08, Paolo Moriello wrote:
> > Hi Colin,
> >
> > Thanks for your interest in this. I agree with you, this change could
> break
> > compatibility. However, changing the source principal is non trivial in
> > this case. In fact, here the problem is not in the internal topic
> creation
> > - which succeeds - but in the two subsequent LeaderAndIsr and
> > UpdateMetadata requests.
> >
> > When a consumer tries to consume for the first time, the creation of
> > internal topic completes, zk-nodes are filled with the necessary
> metadata,
> > and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala)
> > update which, in turn, makes the ControllerChannelManager
> > (ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata
> > requests to the brokers; (I can be wrong, but I believe that this
> requests
> > are already being executed with broker principal). These requests fail
> > because we authorize the cluster operation there, so the
> __consumer_offsets
> > topic remains in a bad state.
>
> I might be misunderstanding something here, but it seems to me that if
> LeaderAndIsrRequest or UpdateMetadataRequest are failing with authorization
> errors, then there is a configuration problem on the cluster which doesn't
> have anything to do with the __consumer_offsets topic.
>
> >
> > Is there a reason to not authorize the operation for find coordinator
> > requests as well?
>
> To be clear, we can't change the authorization for FindCoordinatorRequest.
>
> best,
> Colin
>

Reply via email to