I share Jay's concerns around plugins and he explained that very well, so I will avoid repeating those.
Agree that there is some feedback about getting rid of ZooKeeper altogether and I'm on board with building a native version whenever we are ready to take this on. But I don't think the pain is big enough to solve this by providing a short-term solution via plugins. With respect to ZooKeeper, a more significant issue is performance and correct use of ZooKeeper APIs. First, switching to using the bulk read and write APIs, that ZooKeeper has released a while ago, will make a lot of things around failover better and faster. Second, there is a value in not wrapping the core ZooKeeper APIs in third-party plugins (whether it is ZkClient or Curator) since they try to mask functionality in ways that end up making it very tricky to write correct code. Case in point: https://issues.apache.org/jira/browse/KAFKA-1387. I suspect there are places in our code that still don't use the ZK watcher functionality correctly with the impact being losing important notifications and not acting on certain state changes at all. This is because we depend on ZkClient and it ends up hiding some details of the ZooKeeper API that shouldn't be hidden to handle such cases correctly. My concern is that this is a problem we will have with every pluggable implementation. Today, a lot of our code depends on the sort of guarantees that ZooKeeper provides around watches and ordering. I don't know the other systems well enough to say whether they would be able to provide similar guarantees around all the operations we'd want to support. A better use of effort is to focus on fixing our use of ZooKeeper until we can come back and replace it with the native implementation. On Tue, Dec 1, 2015 at 12:25 PM, Joe Stein <joe.st...@stealth.ly> wrote: > Yeah, lets do both! :) I always had trepidations about leaving things as is > with ZooKeeper there. Can we have this new internal system be what replaces > that but still make it modular somewhat. > > The problem with any new system is that everyone already trusts and relies > on the existing scars we know heal. That is why we all are still using > ZooKeeper ( I bet at least 3 clusters are still on 3.3.4 and one maybe > 3.3.1 or something nutty ). > > etcd > consul > c* > riak > akka > > All have viable solutions and i have no idea what will be best or worst or > even work but lots of folks are working on it now trying to get things to > be different and work right for them. > > I think a native version should be there in the project and I am 100% on > board with that native version NOT be ZooKeeper but homegrown. > > I also think the native default should use the KIP-30 interface so other > server can also connect the feature they are solving also (that way > deployments that have already adopted XYZ for consensus can use it). > > ~ Joe Stein > - - - - - - - - - - - - - - - - - - - > [image: Logo-Black.jpg] > http://www.elodina.net > http://www.stealth.ly > - - - - - - - - - - - - - - - - - - - > > On Tue, Dec 1, 2015 at 2:58 PM, Jay Kreps <j...@confluent.io> wrote: > > > Hey Joe, > > > > Thanks for raising this. People really want to get rid of the ZK > > dependency, I agree it is among the most asked for things. Let me give a > > quick critique and a more radical plan. > > > > I don't think making ZK pluggable is the right thing to do. I have a lot > of > > experience with this dynamic of introducing plugins for core > functionality > > because I previously worked on a key-value store called Voldemort in > which > > we made both the protocol and storage engine totally pluggable. I > > originally felt this was a good thing both philosophically and > practically, > > but in retrospect came to believe it was a huge mistake--what people > really > > wanted was one really excellent implementation with the kind of insane > > levels of in-production usage and test coverage that infrastructure > > demands. Pluggability is actually really at odds with this, and the > ability > > to actually abstract over some really meaty dependency like a storage > > engine never quite works. > > > > People dislike the ZK dependency because it effectively doubles the > > operational load of Kafka--it doubles the amount of configuration, > > monitoring, and understanding needed. Replacing ZK with a similar system > > won't fix this problem though--all the other consensus services are > equally > > complex (and often less mature)--and it will cause two new problems. > First > > there will be a layer of indirection that will make reasoning and > improving > > the ZK implementation harder. For example, note that your plug-in api > > doesn't seem to cover multi-get and multi-write, when we added that we > > would end up breaking all plugins. Each new thing will be like that. Ops > > tools, config, documentation, etc will no longer be able to include any > > coverage of ZK because we can't assume ZK so all that becomes much > harder. > > The second problem is that this introduces a combinatorial testing > problem. > > People say they want to swap out ZK but they are assuming whatever they > > swap in will work equally well. How will we know that is true? The only > way > > to explode out the testing to run with every possible plugin. > > > > If you want to see this in action take a look at ActiveMQ. ActiveMQ is > less > > a system than a family of co-operating plugins and a configuration > language > > for assembling them. Software engineers and open source communities are > > really prone to this kind of thing because "we can just make it > pluggable" > > ends any argument. But the actual implementation is a mess, and later > > improvements in their threading, I/O, and other core models simply > couldn't > > be made across all the plugins. > > > > This blog post on configurability in UI is a really good summary of a > > similar dynamic: > > http://ometer.com/free-software-ui.html > > > > Anyhow, not to go too far off on a rant. Clearly I have plugin PTSD :-) > > > > I think instead we should explore the idea of getting rid of the > zookeeper > > dependency and replace it with an internal facility. Let me explain what > I > > mean. In terms of API what Kafka and ZK do is super different, but > > internally it is actually quite similar--they are both trying to > maintain a > > CP log. > > > > What would actually make the system significantly simpler would be to > > reimplement the facilities you describe on top of Kafka's existing > > infrastructure--using the same log implementation, network stack, config, > > monitoring, etc. If done correctly this would dramatically lower the > > operational load of the system versus the current Kafka+ZK or proposed > > Kafka+X. > > > > I don't have a proposal for how this would work and it's some effort to > > scope it out. The obvious thing to do would just be to keep the existing > > ISR/Controller setup and rebuild the controller etc on a RAFT/Paxos impl > > using the Kafka network/log/etc and have a replicated config database > > (maybe rocksdb) that was fed off the log and shared by all nodes. > > > > If done well this could have the advantage of potentially allowing us to > > scale the number of partitions quite significantly (the k/v store would > not > > need to be all in memory), though you would likely still have limits on > the > > number of partitions per machine. This would make the minimum Kafka > cluster > > size be just your replication factor. > > > > People tend to feel that implementing things like RAFT or Paxos is too > hard > > for mere mortals. But I actually think it is within our capabilities, and > > our testing capabilities as well as experience with this type of thing > have > > improved to the point where we should not be scared off if it is the > right > > path. > > > > This approach is likely more work then plugins (though maybe not, once > you > > factor in all the docs, testing, etc) but if done correctly it would be > an > > unambiguous step forward--a simpler, more scalable implementation with no > > operational dependencies. > > > > Thoughts? > > > > -Jay > > > > > > > > > > > > On Tue, Dec 1, 2015 at 11:12 AM, Joe Stein <joe.st...@stealth.ly> wrote: > > > > > I would like to start a discussion around the work that has started in > > > regards to KIP-30 > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems > > > > > > The impetus for working on this came a lot from the community. For the > > last > > > year(~+) it has been the most asked question at any talk I have given > > > (personally speaking). It has come up a bit also on the mailing list > > > talking about zkclient vs currator. A lot of folks want to use Kafka > but > > > introducing dependencies are hard for the enterprise so the goals > behind > > > this is making it so that using Kafka can be done as easy as possible > for > > > the operations teams to-do when they do. If they are already supporting > > > ZooKeeper they can keep doing that but if not they want (users) to use > > > something else they are already supporting that can plug-in to-do the > > same > > > things. > > > > > > For the core project I think we should leave in upstream what we have. > > This > > > gives a great baseline regression for folks and makes the work for > > "making > > > what we have plug-able work" a good defined task (carve out, layer in > API > > > impl, push back tests pass). From there then when folks want their > > > implementation to be something besides ZooKeeper they can develop, test > > and > > > support that if they choose. > > > > > > We would like to suggest that we have the plugin interface be Java > based > > > for minimizing depends for JVM impl. This could be in another directory > > > something TBD /<name>. > > > > > > If you have a server you want to try to get it working but you aren't > on > > > the JVM don't be afraid just think about a REST impl and if you can > work > > > inside of that you have some light RPC layers (this was the first pass > > > prototype we did to flush-out the public api presented on the KIP). > > > > > > There are a lot of parts to working on this and the more > implementations > > we > > > have the better we can flush out the public interface. I will leave the > > > technical details and design to JIRA tickets that are linked through > the > > > confluence page as these decisions come about and code starts for > reviews > > > and we can target the specific modules having the context separate is > > > helpful especially if multiple folks are working on it. > > > https://issues.apache.org/jira/browse/KAFKA-2916 > > > > > > Do other folks want to build implementations? Maybe we should start a > > > confluence page for those or use an existing one and add to it so we > can > > > coordinate some there to. > > > > > > Thanks! > > > > > > ~ Joe Stein > > > - - - - - - - - - - - - - - - - - - - > > > [image: Logo-Black.jpg] > > > http://www.elodina.net > > > http://www.stealth.ly > > > - - - - - - - - - - - - - - - - - - - > > > > > > -- Thanks, Neha