On Thu, Apr 16, 2020, at 08:51, Ismael Juma wrote: > I don't think these requests are necessarily infrequent under multi tenant > environments though. I've seen Controller availability being an issue for > describe topics for example (before it was changed to go to any broker).
Hi Ismael, I don't think DescribeTopics is a good comparison. That RPC is available to regular users and is used many orders of magnitude more frequently than administrative operations like changing ACLs or setting quotas. The operations we're talking about redirecting here all require the highest possible permissions and will not be frequent in any real-world cluster... unless someone is running a stress-test or a benchmark. We didn't even notice some of the serious bugs in setting dynamic configs until recently because the alterConfigs / incrementalAlterConfigs RPCs are so infrequently called. Additionally, this KIP fixes some existing bugs. The current approach of having random writers do a read-write-modify cycle on a configuration znode is buggy since it could be interleaved with another node's read-modify write cycle. It has a "lost updates" problem. For example, node 1 reads a config znode. Node 2 reads the same config znode. Node 1 writes back a modified version of the znode. Node 2 writes back its (differently) modified version, overwriting the changes from node 1. I don't think anyone ever noticed this problem since, again, these operations are very infrequent, making the chance of such a collision low. But it is a serious bug that is fixed by having a single writer. (We should add this to the KIP...) > > Would it be better to redirect once the controller quorum is there? This KIP is needed for the bridge release. The bridge release upgrade process relies on the old nodes sending their administrative operations to the controller quorum, not directly to zookeeper. best, Colin > > Note that this is different from things like AlterIsr since these calls are > coming from clients versus other brokers. > > Ismael > > On Wed, Apr 15, 2020, 5:10 PM Colin McCabe <cmcc...@apache.org> wrote: > > > Hi Ismael, > > > > I agree that sending these requests through the controller will not work > > during the periods when there is no controller. However, those periods > > should be short-- otherwise we have bigger problems in the cluster. > > > > These requests are very infrequent because they are administrative > > operations. Basically the affected operations are changing ACLs, changing > > dynamic configurations, and changing quotas. > > > > best, > > Colin > > > > > > On Wed, Apr 15, 2020, at 15:25, Ismael Juma wrote: > > > Hi Boyang, > > > > > > Thanks for the KIP. Have we considered that this reduces availability for > > > these operations since we have a single Controller instead of the ZK > > quorum? > > > > > > Ismael > > > > > > On Fri, Apr 3, 2020 at 4:45 PM Boyang Chen <reluctanthero...@gmail.com> > > > wrote: > > > > > > > Hey all, > > > > > > > > I would like to start off the discussion for KIP-590, a follow-up > > > > initiative after KIP-500: > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller > > > > > > > > This KIP proposes to migrate existing Zookeeper mutation paths, > > including > > > > configuration, security and quota changes, to controller-only by always > > > > routing these alterations to the controller. > > > > > > > > Let me know your thoughts! > > > > > > > > Best, > > > > Boyang > > > > > > > > > >