+1 to zk bootstrap + close as an option at least
On Tue, Jan 28, 2014 at 10:09 AM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > >> The producer since 0.8 is actually zookeeper free, so this is not new to > this client it is true for the current client as well. Our experience was > that direct zookeeper connections from zillions of producers wasn't a good > idea for a number of reasons. > > The problem with several thousand connections to zookeeper is mainly the > long lived sessions causing overhead on zookeeper. > This further degrades zookeeper performance causing it to be flaky and > expire sessions/disconnect clients and so on. That being said, > I don't see why we can't use zookeeper *just* for the bootstrap on client > startup and close the connection right after the bootstrap is done. > IMO, this is more intuitive and convenient as it will allow users to the > same "bootstrap config" across producers, consumers and brokers and > will not cause any performance/operational issues on zookeeper. This is > assuming that all the zillion clients don't bootstrap at the same time, > which is rare in practice. > > Thanks, > Neha > > > On Tue, Jan 28, 2014 at 8:02 AM, Mattijs Ugen (DT) <matt...@holmes.nl > >wrote: > > > Sorry to tune in a bit late, but here goes. > > > > > 1. The producer since 0.8 is actually zookeeper free, so this is not > new > > to > > > this client it is true for the current client as well. Our experience > was > > > that direct zookeeper connections from zillions of producers wasn't a > > good > > > idea for a number of reasons. Our intention is to remove this > dependency > > > from the consumer as well. The configuration in the producer doesn't > need > > > the full set of brokers, though, just one or two machines to bootstrap > > the > > > state of the cluster from--in other words it isn't like you need to > > > reconfigure your clients every time you add some servers. This is > exactly > > > how zookeeper works too--if we used zookeeper you would need to give a > > list > > > of zk urls in case a particular zk server was down. Basically either > way > > > you need a few statically configured nodes to go to discover the full > > state > > > of the cluster. For people who don't like hard coding hosts you can > use a > > > VIP or dns or something instead. > > In our configuration, the zookeeper quorum is actually one of the few > > stable (in the sense of host names / ip addresses) pillars of the > > complete ecosystem: every distributed service uses zookeeper to > > coordinate the hosts that make up the service as a whole. Considering > > that the kafka cluster will save the information needed for this > > bootstrap to zookeeper anyhow, having clients (either producers or > > consumers) retrieve this information at first use makes sense to me. > > > > We could create routine that retrieves a list of brokers from zookeeper > > before initializing a Producer, but that feels more like a workaround > > for a feature that in my humble opinion could well be part of the kafka > > client library. That said, I realise that having two options for > > connection bootstrapping (assuming that hardcoding a list of brokers is > > here to stay) could be confusing for new users, but bypassing zookeeper > > for this was rather confusing for me when I first came across it :) > > > > So, in short, I'd love it if the option to bootstrap the broker list > > from zookeeper was there, rather than requiring to configure additional > > (moving) virtual hostnames or fixed ip addresses for producers in our > > cluster setup. I've been baffled a few times by this option not being > > available for a distributed service that coordinates itself through > > zookeeper. > > > > Just my two cents :) > > > > Mattijs > > >