Hi Colin,

The read/modify/write is protected by the zk version, right?

Ismael

On Fri, Apr 17, 2020 at 12:53 PM Colin McCabe <cmcc...@apache.org> wrote:

> On Thu, Apr 16, 2020, at 08:51, Ismael Juma wrote:
> > I don't think these requests are necessarily infrequent under multi
> tenant
> > environments though. I've seen Controller availability being an issue for
> > describe topics for example (before it was changed to go to any broker).
>
> Hi Ismael,
>
> I don't think DescribeTopics is a good comparison.  That RPC is available
> to regular users and is used many orders of magnitude more frequently than
> administrative operations like changing ACLs or setting quotas.
>
> The operations we're talking about redirecting here all require the
> highest possible permissions and will not be frequent in any real-world
> cluster... unless someone is running a stress-test or a benchmark.  We
> didn't even notice some of the serious bugs in setting dynamic configs
> until recently because the alterConfigs / incrementalAlterConfigs RPCs are
> so infrequently called.
>
> Additionally, this KIP fixes some existing bugs.  The current approach of
> having random writers do a read-write-modify cycle on a configuration znode
> is buggy since it could be interleaved with another node's read-modify
> write cycle.  It has a "lost updates" problem.
>
> For example, node 1 reads a config znode.  Node 2 reads the same config
> znode.  Node 1 writes back a modified version of the znode.  Node 2 writes
> back its (differently) modified version, overwriting the changes from node
> 1.
>
> I don't think anyone ever noticed this problem since, again, these
> operations are very infrequent, making the chance of such a collision low.
> But it is a serious bug that is fixed by having a single writer.  (We
> should add this to the KIP...)
>
> >
> > Would it be better to redirect once the controller quorum is there?
>
> This KIP is needed for the bridge release.  The bridge release upgrade
> process relies on the old nodes sending their administrative operations to
> the controller quorum, not directly to zookeeper.
>
> best,
> Colin
>
>
> >
> > Note that this is different from things like AlterIsr since these calls
> are
> > coming from clients versus other brokers.
> >
> > Ismael
> >
> > On Wed, Apr 15, 2020, 5:10 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > Hi Ismael,
> > >
> > > I agree that sending these requests through the controller will not
> work
> > > during the periods when there is no controller.  However, those periods
> > > should be short-- otherwise we have bigger problems in the cluster.
> > >
> > > These requests are very infrequent because they are administrative
> > > operations.  Basically the affected operations are changing ACLs,
> changing
> > > dynamic configurations, and changing quotas.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Apr 15, 2020, at 15:25, Ismael Juma wrote:
> > > > Hi Boyang,
> > > >
> > > > Thanks for the KIP. Have we considered that this reduces
> availability for
> > > > these operations since we have a single Controller instead of the ZK
> > > quorum?
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Apr 3, 2020 at 4:45 PM Boyang Chen <
> reluctanthero...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey all,
> > > > >
> > > > > I would like to start off the discussion for KIP-590, a follow-up
> > > > > initiative after KIP-500:
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller
> > > > >
> > > > > This KIP proposes to migrate existing Zookeeper mutation paths,
> > > including
> > > > > configuration, security and quota changes, to controller-only by
> always
> > > > > routing these alterations to the controller.
> > > > >
> > > > > Let me know your thoughts!
> > > > >
> > > > > Best,
> > > > > Boyang
> > > > >
> > > >
> > >
> >
>

Reply via email to