Re: [DISCUSS] KIP-30 Allow for brokers to have plug-able consensus and meta data storage sub systems

Onur Karaman Tue, 01 Dec 2015 13:45:44 -0800

+1 on what Neha said.

On Tue, Dec 1, 2015 at 1:41 PM, Neha Narkhede <n...@confluent.io> wrote:


> I share Jay's concerns around plugins and he explained that very well, so I
> will avoid repeating those.
>
> Agree that there is some feedback about getting rid of ZooKeeper altogether
> and I'm on board with building a native version whenever we are ready to
> take this on. But I don't think the pain is big enough to solve this by
> providing a short-term solution via plugins.
>
> With respect to ZooKeeper, a more significant issue is performance and
> correct use of ZooKeeper APIs. First, switching to using the bulk read and
> write APIs, that ZooKeeper has released a while ago, will make a lot of
> things around failover better and faster. Second, there is a value in not
> wrapping the core ZooKeeper APIs in third-party plugins (whether it is
> ZkClient or Curator) since they try to mask functionality in ways that end
> up making it very tricky to write correct code. Case in point:
> https://issues.apache.org/jira/browse/KAFKA-1387. I suspect there are
> places in our code that still don't use the ZK watcher functionality
> correctly with the impact being losing important notifications and not
> acting on certain state changes at all. This is because we depend on
> ZkClient and it ends up hiding some details of the ZooKeeper API that
> shouldn't be hidden to handle such cases correctly.
>
> My concern is that this is a problem we will have with every pluggable
> implementation. Today, a lot of our code depends on the sort of guarantees
> that ZooKeeper provides around watches and ordering. I don't know the other
> systems well enough to say whether they would be able to provide similar
> guarantees around all the operations we'd want to support.
>
> A better use of effort is to focus on fixing our use of ZooKeeper until we
> can come back and replace it with the native implementation.
>
> On Tue, Dec 1, 2015 at 12:25 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>
> > Yeah, lets do both! :) I always had trepidations about leaving things as
> is
> > with ZooKeeper there. Can we have this new internal system be what
> replaces
> > that but still make it modular somewhat.
> >
> > The problem with any new system is that everyone already trusts and
> relies
> > on the existing scars we know heal. That is why we all are still using
> > ZooKeeper ( I bet at least 3 clusters are still on 3.3.4 and one maybe
> > 3.3.1 or something nutty ).
> >
> > etcd
> > consul
> > c*
> > riak
> > akka
> >
> > All have viable solutions and i have no idea what will be best or worst
> or
> > even work but lots of folks are working on it now trying to get things to
> > be different and work right for them.
> >
> > I think a native version should be there in the project and I am 100% on
> > board with that native version NOT be ZooKeeper but homegrown.
> >
> > I also think the native default should use the KIP-30 interface so other
> > server can also connect the feature they are solving also (that way
> > deployments that have already adopted XYZ for consensus can use it).
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - - - -
> >      [image: Logo-Black.jpg]
> >   http://www.elodina.net
> >     http://www.stealth.ly
> > - - - - - - - - - - - - - - - - - - -
> >
> > On Tue, Dec 1, 2015 at 2:58 PM, Jay Kreps <j...@confluent.io> wrote:
> >
> > > Hey Joe,
> > >
> > > Thanks for raising this. People really want to get rid of the ZK
> > > dependency, I agree it is among the most asked for things. Let me give
> a
> > > quick critique and a more radical plan.
> > >
> > > I don't think making ZK pluggable is the right thing to do. I have a
> lot
> > of
> > > experience with this dynamic of introducing plugins for core
> > functionality
> > > because I previously worked on a key-value store called Voldemort in
> > which
> > > we made both the protocol and storage engine totally pluggable. I
> > > originally felt this was a good thing both philosophically and
> > practically,
> > > but in retrospect came to believe it was a huge mistake--what people
> > really
> > > wanted was one really excellent implementation with the kind of insane
> > > levels of in-production usage and test coverage that infrastructure
> > > demands. Pluggability is actually really at odds with this, and the
> > ability
> > > to actually abstract over some really meaty dependency like a storage
> > > engine never quite works.
> > >
> > > People dislike the ZK dependency because it effectively doubles the
> > > operational load of Kafka--it doubles the amount of configuration,
> > > monitoring, and understanding needed. Replacing ZK with a similar
> system
> > > won't fix this problem though--all the other consensus services are
> > equally
> > > complex (and often less mature)--and it will cause two new problems.
> > First
> > > there will be a layer of indirection that will make reasoning and
> > improving
> > > the ZK implementation harder. For example, note that your plug-in api
> > > doesn't seem to cover multi-get and multi-write, when we added that we
> > > would end up breaking all plugins. Each new thing will be like that.
> Ops
> > > tools, config, documentation, etc will no longer be able to include any
> > > coverage of ZK because we can't assume ZK so all that becomes much
> > harder.
> > > The second problem is that this introduces a combinatorial testing
> > problem.
> > > People say they want to swap out ZK but they are assuming whatever they
> > > swap in will work equally well. How will we know that is true? The only
> > way
> > > to explode out the testing to run with every possible plugin.
> > >
> > > If you want to see this in action take a look at ActiveMQ. ActiveMQ is
> > less
> > > a system than a family of co-operating plugins and a configuration
> > language
> > > for assembling them. Software engineers and open source communities are
> > > really prone to this kind of thing because "we can just make it
> > pluggable"
> > > ends any argument. But the actual implementation is a mess, and later
> > > improvements in their threading, I/O, and other core models simply
> > couldn't
> > > be made across all the plugins.
> > >
> > > This blog post on configurability in UI is a really good summary of a
> > > similar dynamic:
> > > http://ometer.com/free-software-ui.html
> > >
> > > Anyhow, not to go too far off on a rant. Clearly I have plugin PTSD :-)
> > >
> > > I think instead we should explore the idea of getting rid of the
> > zookeeper
> > > dependency and replace it with an internal facility. Let me explain
> what
> > I
> > > mean. In terms of API what Kafka and ZK do is super different, but
> > > internally it is actually quite similar--they are both trying to
> > maintain a
> > > CP log.
> > >
> > > What would actually make the system significantly simpler would be to
> > > reimplement the facilities you describe on top of Kafka's existing
> > > infrastructure--using the same log implementation, network stack,
> config,
> > > monitoring, etc. If done correctly this would dramatically lower the
> > > operational load of the system versus the current Kafka+ZK or proposed
> > > Kafka+X.
> > >
> > > I don't have a proposal for how this would work and it's some effort to
> > > scope it out. The obvious thing to do would just be to keep the
> existing
> > > ISR/Controller setup and rebuild the controller etc on a RAFT/Paxos
> impl
> > > using the Kafka network/log/etc and have a replicated config database
> > > (maybe rocksdb) that was fed off the log and shared by all nodes.
> > >
> > > If done well this could have the advantage of potentially allowing us
> to
> > > scale the number of partitions quite significantly (the k/v store would
> > not
> > > need to be all in memory), though you would likely still have limits on
> > the
> > > number of partitions per machine. This would make the minimum Kafka
> > cluster
> > > size be just your replication factor.
> > >
> > > People tend to feel that implementing things like RAFT or Paxos is too
> > hard
> > > for mere mortals. But I actually think it is within our capabilities,
> and
> > > our testing capabilities as well as experience with this type of thing
> > have
> > > improved to the point where we should not be scared off if it is the
> > right
> > > path.
> > >
> > > This approach is likely more work then plugins (though maybe not, once
> > you
> > > factor in all the docs, testing, etc) but if done correctly it would be
> > an
> > > unambiguous step forward--a simpler, more scalable implementation with
> no
> > > operational dependencies.
> > >
> > > Thoughts?
> > >
> > > -Jay
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Dec 1, 2015 at 11:12 AM, Joe Stein <joe.st...@stealth.ly>
> wrote:
> > >
> > > > I would like to start a discussion around the work that has started
> in
> > > > regards to KIP-30
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems
> > > >
> > > > The impetus for working on this came a lot from the community. For
> the
> > > last
> > > > year(~+) it has been the most asked question at any talk I have given
> > > > (personally speaking). It has come up a bit also on the mailing list
> > > > talking about zkclient vs currator. A lot of folks want to use Kafka
> > but
> > > > introducing dependencies are hard for the enterprise so the goals
> > behind
> > > > this is making it so that using Kafka can be done as easy as possible
> > for
> > > > the operations teams to-do when they do. If they are already
> supporting
> > > > ZooKeeper they can keep doing that but if not they want (users) to
> use
> > > > something else they are already supporting that can plug-in to-do the
> > > same
> > > > things.
> > > >
> > > > For the core project I think we should leave in upstream what we
> have.
> > > This
> > > > gives a great baseline regression for folks and makes the work for
> > > "making
> > > > what we have plug-able work" a good defined task (carve out, layer in
> > API
> > > > impl, push back tests pass). From there then when folks want their
> > > > implementation to be something besides ZooKeeper they can develop,
> test
> > > and
> > > > support that if they choose.
> > > >
> > > > We would like to suggest that we have the plugin interface be Java
> > based
> > > > for minimizing depends for JVM impl. This could be in another
> directory
> > > > something TBD /<name>.
> > > >
> > > > If you have a server you want to try to get it working but you aren't
> > on
> > > > the JVM don't be afraid just think about a REST impl and if you can
> > work
> > > > inside of that you have some light RPC layers (this was the first
> pass
> > > > prototype we did to flush-out the public api presented on the KIP).
> > > >
> > > > There are a lot of parts to working on this and the more
> > implementations
> > > we
> > > > have the better we can flush out the public interface. I will leave
> the
> > > > technical details and design to JIRA tickets that are linked through
> > the
> > > > confluence page as these decisions come about and code starts for
> > reviews
> > > > and we can target the specific modules having the context separate is
> > > > helpful especially if multiple folks are working on it.
> > > > https://issues.apache.org/jira/browse/KAFKA-2916
> > > >
> > > > Do other folks want to build implementations? Maybe we should start a
> > > > confluence page for those or use an existing one and add to it so we
> > can
> > > > coordinate some there to.
> > > >
> > > > Thanks!
> > > >
> > > > ~ Joe Stein
> > > > - - - - - - - - - - - - - - - - - - -
> > > >      [image: Logo-Black.jpg]
> > > >   http://www.elodina.net
> > > >     http://www.stealth.ly
> > > > - - - - - - - - - - - - - - - - - - -
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Neha
>

Re: [DISCUSS] KIP-30 Allow for brokers to have plug-able consensus and meta data storage sub systems

Reply via email to