+1 on what Neha said. On Tue, Dec 1, 2015 at 1:41 PM, Neha Narkhede <n...@confluent.io> wrote:
> I share Jay's concerns around plugins and he explained that very well, so I > will avoid repeating those. > > Agree that there is some feedback about getting rid of ZooKeeper altogether > and I'm on board with building a native version whenever we are ready to > take this on. But I don't think the pain is big enough to solve this by > providing a short-term solution via plugins. > > With respect to ZooKeeper, a more significant issue is performance and > correct use of ZooKeeper APIs. First, switching to using the bulk read and > write APIs, that ZooKeeper has released a while ago, will make a lot of > things around failover better and faster. Second, there is a value in not > wrapping the core ZooKeeper APIs in third-party plugins (whether it is > ZkClient or Curator) since they try to mask functionality in ways that end > up making it very tricky to write correct code. Case in point: > https://issues.apache.org/jira/browse/KAFKA-1387. I suspect there are > places in our code that still don't use the ZK watcher functionality > correctly with the impact being losing important notifications and not > acting on certain state changes at all. This is because we depend on > ZkClient and it ends up hiding some details of the ZooKeeper API that > shouldn't be hidden to handle such cases correctly. > > My concern is that this is a problem we will have with every pluggable > implementation. Today, a lot of our code depends on the sort of guarantees > that ZooKeeper provides around watches and ordering. I don't know the other > systems well enough to say whether they would be able to provide similar > guarantees around all the operations we'd want to support. > > A better use of effort is to focus on fixing our use of ZooKeeper until we > can come back and replace it with the native implementation. > > On Tue, Dec 1, 2015 at 12:25 PM, Joe Stein <joe.st...@stealth.ly> wrote: > > > Yeah, lets do both! :) I always had trepidations about leaving things as > is > > with ZooKeeper there. Can we have this new internal system be what > replaces > > that but still make it modular somewhat. > > > > The problem with any new system is that everyone already trusts and > relies > > on the existing scars we know heal. That is why we all are still using > > ZooKeeper ( I bet at least 3 clusters are still on 3.3.4 and one maybe > > 3.3.1 or something nutty ). > > > > etcd > > consul > > c* > > riak > > akka > > > > All have viable solutions and i have no idea what will be best or worst > or > > even work but lots of folks are working on it now trying to get things to > > be different and work right for them. > > > > I think a native version should be there in the project and I am 100% on > > board with that native version NOT be ZooKeeper but homegrown. > > > > I also think the native default should use the KIP-30 interface so other > > server can also connect the feature they are solving also (that way > > deployments that have already adopted XYZ for consensus can use it). > > > > ~ Joe Stein > > - - - - - - - - - - - - - - - - - - - > > [image: Logo-Black.jpg] > > http://www.elodina.net > > http://www.stealth.ly > > - - - - - - - - - - - - - - - - - - - > > > > On Tue, Dec 1, 2015 at 2:58 PM, Jay Kreps <j...@confluent.io> wrote: > > > > > Hey Joe, > > > > > > Thanks for raising this. People really want to get rid of the ZK > > > dependency, I agree it is among the most asked for things. Let me give > a > > > quick critique and a more radical plan. > > > > > > I don't think making ZK pluggable is the right thing to do. I have a > lot > > of > > > experience with this dynamic of introducing plugins for core > > functionality > > > because I previously worked on a key-value store called Voldemort in > > which > > > we made both the protocol and storage engine totally pluggable. I > > > originally felt this was a good thing both philosophically and > > practically, > > > but in retrospect came to believe it was a huge mistake--what people > > really > > > wanted was one really excellent implementation with the kind of insane > > > levels of in-production usage and test coverage that infrastructure > > > demands. Pluggability is actually really at odds with this, and the > > ability > > > to actually abstract over some really meaty dependency like a storage > > > engine never quite works. > > > > > > People dislike the ZK dependency because it effectively doubles the > > > operational load of Kafka--it doubles the amount of configuration, > > > monitoring, and understanding needed. Replacing ZK with a similar > system > > > won't fix this problem though--all the other consensus services are > > equally > > > complex (and often less mature)--and it will cause two new problems. > > First > > > there will be a layer of indirection that will make reasoning and > > improving > > > the ZK implementation harder. For example, note that your plug-in api > > > doesn't seem to cover multi-get and multi-write, when we added that we > > > would end up breaking all plugins. Each new thing will be like that. > Ops > > > tools, config, documentation, etc will no longer be able to include any > > > coverage of ZK because we can't assume ZK so all that becomes much > > harder. > > > The second problem is that this introduces a combinatorial testing > > problem. > > > People say they want to swap out ZK but they are assuming whatever they > > > swap in will work equally well. How will we know that is true? The only > > way > > > to explode out the testing to run with every possible plugin. > > > > > > If you want to see this in action take a look at ActiveMQ. ActiveMQ is > > less > > > a system than a family of co-operating plugins and a configuration > > language > > > for assembling them. Software engineers and open source communities are > > > really prone to this kind of thing because "we can just make it > > pluggable" > > > ends any argument. But the actual implementation is a mess, and later > > > improvements in their threading, I/O, and other core models simply > > couldn't > > > be made across all the plugins. > > > > > > This blog post on configurability in UI is a really good summary of a > > > similar dynamic: > > > http://ometer.com/free-software-ui.html > > > > > > Anyhow, not to go too far off on a rant. Clearly I have plugin PTSD :-) > > > > > > I think instead we should explore the idea of getting rid of the > > zookeeper > > > dependency and replace it with an internal facility. Let me explain > what > > I > > > mean. In terms of API what Kafka and ZK do is super different, but > > > internally it is actually quite similar--they are both trying to > > maintain a > > > CP log. > > > > > > What would actually make the system significantly simpler would be to > > > reimplement the facilities you describe on top of Kafka's existing > > > infrastructure--using the same log implementation, network stack, > config, > > > monitoring, etc. If done correctly this would dramatically lower the > > > operational load of the system versus the current Kafka+ZK or proposed > > > Kafka+X. > > > > > > I don't have a proposal for how this would work and it's some effort to > > > scope it out. The obvious thing to do would just be to keep the > existing > > > ISR/Controller setup and rebuild the controller etc on a RAFT/Paxos > impl > > > using the Kafka network/log/etc and have a replicated config database > > > (maybe rocksdb) that was fed off the log and shared by all nodes. > > > > > > If done well this could have the advantage of potentially allowing us > to > > > scale the number of partitions quite significantly (the k/v store would > > not > > > need to be all in memory), though you would likely still have limits on > > the > > > number of partitions per machine. This would make the minimum Kafka > > cluster > > > size be just your replication factor. > > > > > > People tend to feel that implementing things like RAFT or Paxos is too > > hard > > > for mere mortals. But I actually think it is within our capabilities, > and > > > our testing capabilities as well as experience with this type of thing > > have > > > improved to the point where we should not be scared off if it is the > > right > > > path. > > > > > > This approach is likely more work then plugins (though maybe not, once > > you > > > factor in all the docs, testing, etc) but if done correctly it would be > > an > > > unambiguous step forward--a simpler, more scalable implementation with > no > > > operational dependencies. > > > > > > Thoughts? > > > > > > -Jay > > > > > > > > > > > > > > > > > > On Tue, Dec 1, 2015 at 11:12 AM, Joe Stein <joe.st...@stealth.ly> > wrote: > > > > > > > I would like to start a discussion around the work that has started > in > > > > regards to KIP-30 > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems > > > > > > > > The impetus for working on this came a lot from the community. For > the > > > last > > > > year(~+) it has been the most asked question at any talk I have given > > > > (personally speaking). It has come up a bit also on the mailing list > > > > talking about zkclient vs currator. A lot of folks want to use Kafka > > but > > > > introducing dependencies are hard for the enterprise so the goals > > behind > > > > this is making it so that using Kafka can be done as easy as possible > > for > > > > the operations teams to-do when they do. If they are already > supporting > > > > ZooKeeper they can keep doing that but if not they want (users) to > use > > > > something else they are already supporting that can plug-in to-do the > > > same > > > > things. > > > > > > > > For the core project I think we should leave in upstream what we > have. > > > This > > > > gives a great baseline regression for folks and makes the work for > > > "making > > > > what we have plug-able work" a good defined task (carve out, layer in > > API > > > > impl, push back tests pass). From there then when folks want their > > > > implementation to be something besides ZooKeeper they can develop, > test > > > and > > > > support that if they choose. > > > > > > > > We would like to suggest that we have the plugin interface be Java > > based > > > > for minimizing depends for JVM impl. This could be in another > directory > > > > something TBD /<name>. > > > > > > > > If you have a server you want to try to get it working but you aren't > > on > > > > the JVM don't be afraid just think about a REST impl and if you can > > work > > > > inside of that you have some light RPC layers (this was the first > pass > > > > prototype we did to flush-out the public api presented on the KIP). > > > > > > > > There are a lot of parts to working on this and the more > > implementations > > > we > > > > have the better we can flush out the public interface. I will leave > the > > > > technical details and design to JIRA tickets that are linked through > > the > > > > confluence page as these decisions come about and code starts for > > reviews > > > > and we can target the specific modules having the context separate is > > > > helpful especially if multiple folks are working on it. > > > > https://issues.apache.org/jira/browse/KAFKA-2916 > > > > > > > > Do other folks want to build implementations? Maybe we should start a > > > > confluence page for those or use an existing one and add to it so we > > can > > > > coordinate some there to. > > > > > > > > Thanks! > > > > > > > > ~ Joe Stein > > > > - - - - - - - - - - - - - - - - - - - > > > > [image: Logo-Black.jpg] > > > > http://www.elodina.net > > > > http://www.stealth.ly > > > > - - - - - - - - - - - - - - - - - - - > > > > > > > > > > > > > -- > Thanks, > Neha >