On Thu, Apr 9, 2020, at 09:36, Paolo Moriello wrote: > Hi Colin, > > Thanks again for checking this out. > > Indeed you are right, a configuration problem is what leads to > authorization failure (and consequently to the internal topics bug): i.e. > incorrect ACLs configuration. In particular, in case of insufficient > cluster-level ACLs, so if one does not include the broker CN required to > allow inter-broker communication when client SSL is required: > 1) FindCoordinator request completes successfully, and __consumer_offsets > topic is created in zk > 2) but subsequent UpdateMetadata and LeaderAndIsr fail. This leaves the > internal topic in a bad state > > A deeper look confirmed that the change I proposed initially does not work, > since authorizing the user principal is not enough to prevent the issue. > However, I believe that we should still avoid creating the internal > topic(s) at all in case of insufficient broker ACLs (which means, make > FindCoordinator request fail since we won't have the required metadata). A > possibility could be to try to check the existence of brokers' ACLs before > creating the internal topic. > Let me know if you have any feedback.
Hi Paolo, If the problem is broker ACLs being configured incorrectly so that it can't receive requests from the controller, a lot of things will fail. This isn't really related to anything with FindCoordinator. best, Colin > > Thanks, > Paolo > > > On Tue, 7 Apr 2020 at 17:12, Colin McCabe <cmcc...@apache.org> wrote: > > > On Tue, Apr 7, 2020, at 08:08, Paolo Moriello wrote: > > > Hi Colin, > > > > > > Thanks for your interest in this. I agree with you, this change could > > break > > > compatibility. However, changing the source principal is non trivial in > > > this case. In fact, here the problem is not in the internal topic > > creation > > > - which succeeds - but in the two subsequent LeaderAndIsr and > > > UpdateMetadata requests. > > > > > > When a consumer tries to consume for the first time, the creation of > > > internal topic completes, zk-nodes are filled with the necessary > > metadata, > > > and this triggers a ZkPartitionStateMachine (PartitionStateMachine.scala) > > > update which, in turn, makes the ControllerChannelManager > > > (ControllerChannelManager.scala) send LeaderAndIsr and UpdateMetadata > > > requests to the brokers; (I can be wrong, but I believe that this > > requests > > > are already being executed with broker principal). These requests fail > > > because we authorize the cluster operation there, so the > > __consumer_offsets > > > topic remains in a bad state. > > > > I might be misunderstanding something here, but it seems to me that if > > LeaderAndIsrRequest or UpdateMetadataRequest are failing with authorization > > errors, then there is a configuration problem on the cluster which doesn't > > have anything to do with the __consumer_offsets topic. > > > > > > > > Is there a reason to not authorize the operation for find coordinator > > > requests as well? > > > > To be clear, we can't change the authorization for FindCoordinatorRequest. > > > > best, > > Colin > > >