On Fri, Apr 17, 2020, at 13:11, Ismael Juma wrote: > Hi Colin, > > The read/modify/write is protected by the zk version, right? > > Ismael
No, we don't use the ZK version when doing the write to the config znodes. We do for ACLs, I think. This is something that we could fix just by using the ZK version, but there are other race conditions like what if we're deleting a topic while setting this config, etc. A single writer is a lot easier to reason about. best, Colin > > On Fri, Apr 17, 2020 at 12:53 PM Colin McCabe <cmcc...@apache.org> wrote: > > > On Thu, Apr 16, 2020, at 08:51, Ismael Juma wrote: > > > I don't think these requests are necessarily infrequent under multi > > tenant > > > environments though. I've seen Controller availability being an issue for > > > describe topics for example (before it was changed to go to any broker). > > > > Hi Ismael, > > > > I don't think DescribeTopics is a good comparison. That RPC is available > > to regular users and is used many orders of magnitude more frequently than > > administrative operations like changing ACLs or setting quotas. > > > > The operations we're talking about redirecting here all require the > > highest possible permissions and will not be frequent in any real-world > > cluster... unless someone is running a stress-test or a benchmark. We > > didn't even notice some of the serious bugs in setting dynamic configs > > until recently because the alterConfigs / incrementalAlterConfigs RPCs are > > so infrequently called. > > > > Additionally, this KIP fixes some existing bugs. The current approach of > > having random writers do a read-write-modify cycle on a configuration znode > > is buggy since it could be interleaved with another node's read-modify > > write cycle. It has a "lost updates" problem. > > > > For example, node 1 reads a config znode. Node 2 reads the same config > > znode. Node 1 writes back a modified version of the znode. Node 2 writes > > back its (differently) modified version, overwriting the changes from node > > 1. > > > > I don't think anyone ever noticed this problem since, again, these > > operations are very infrequent, making the chance of such a collision low. > > But it is a serious bug that is fixed by having a single writer. (We > > should add this to the KIP...) > > > > > > > > Would it be better to redirect once the controller quorum is there? > > > > This KIP is needed for the bridge release. The bridge release upgrade > > process relies on the old nodes sending their administrative operations to > > the controller quorum, not directly to zookeeper. > > > > best, > > Colin > > > > > > > > > > Note that this is different from things like AlterIsr since these calls > > are > > > coming from clients versus other brokers. > > > > > > Ismael > > > > > > On Wed, Apr 15, 2020, 5:10 PM Colin McCabe <cmcc...@apache.org> wrote: > > > > > > > Hi Ismael, > > > > > > > > I agree that sending these requests through the controller will not > > work > > > > during the periods when there is no controller. However, those periods > > > > should be short-- otherwise we have bigger problems in the cluster. > > > > > > > > These requests are very infrequent because they are administrative > > > > operations. Basically the affected operations are changing ACLs, > > changing > > > > dynamic configurations, and changing quotas. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > On Wed, Apr 15, 2020, at 15:25, Ismael Juma wrote: > > > > > Hi Boyang, > > > > > > > > > > Thanks for the KIP. Have we considered that this reduces > > availability for > > > > > these operations since we have a single Controller instead of the ZK > > > > quorum? > > > > > > > > > > Ismael > > > > > > > > > > On Fri, Apr 3, 2020 at 4:45 PM Boyang Chen < > > reluctanthero...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hey all, > > > > > > > > > > > > I would like to start off the discussion for KIP-590, a follow-up > > > > > > initiative after KIP-500: > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller > > > > > > > > > > > > This KIP proposes to migrate existing Zookeeper mutation paths, > > > > including > > > > > > configuration, security and quota changes, to controller-only by > > always > > > > > > routing these alterations to the controller. > > > > > > > > > > > > Let me know your thoughts! > > > > > > > > > > > > Best, > > > > > > Boyang > > > > > > > > > > > > > > > > > > > > >