OK, I understand. So for the Zookeeper cluster, can I go with something like:
3 x Dell R320: Single hexcore 2.5GHz Xeon, 32GB RAM, 4x10K 300GB SAS drives, 10GbE and if I do, can I drop the CPU specs on the broker machines to say, dual 6 cores? Or are we looking at something that is core bound here? Thanks, Ken On Mar 15, 2014, at 11:09 AM, Ray Rodriguez <rayrod2...@gmail.com> wrote: > Imagine a situation where one of your nodes running a kafka broker and > zookeeper node goes down. You now have to contend with two distributed > systems that need to do leader election and consensus in the case of a > zookeeper ensemble and partition rebalancing/repair in the case of a kafka > cluster so I think Jun's point is that when running distributed systems try > to isolate them as much as possible from running on the same node to > achieve better fault tolerance and high availability. > > From the Kafka docs you can see that a zookeeper cluster does't need to sit > on very powerful hardware to be reliable so I believe the suggestion is to > run a small independent zookeeper cluster that will be used by kafka and by > all means don't hesitate to reuse that zookeeper ensemble for other systems > as long as you can guarantee that all the systems using the zk ensemble use > some form of znode root to keep their data seperated within the zookeeper > znode directory structure. > > This is an interesting topic and I'd love to hear if anyone else is running > their zk alongside their kafka brokers in production? > > Ray > > > On Sat, Mar 15, 2014 at 10:28 AM, Carlile, Ken > <carli...@janelia.hhmi.org>wrote: > >> I'd rather not purchase dedicated hardware for ZK if I don't absolutely >> have to, unless I can use it for multiple clusters (ie Kafka, HBase, other >> things that rely on ZK). Would adding more cores help with ZK on the same >> machine? Or is that just a waste of cores, considering that it's java under >> all of this? >> >> --Ken >> >> On Mar 15, 2014, at 12:07 AM, Jun Rao <jun...@gmail.com> wrote: >> >>> The spec looks reasonable. If you have other machines, it may be better >> to >>> put ZK on its own machines. >>> >>> Thanks, >>> >>> Jun >>> >>> >>> On Fri, Mar 14, 2014 at 10:52 AM, Carlile, Ken < >> carli...@janelia.hhmi.org>wrote: >>> >>>> Hi all, >>>> >>>> I'm looking at setting up a (small) Kafka cluster for streaming >> microscope >>>> data to Spark-Streaming. >>>> >>>> The producer would be a single Windows 7 machine with a 1Gb or 10Gb >>>> ethernet connection running http posts from Matlab (this bit is a little >>>> fuzzy, and I'm not the user, I'm an admin), the consumer would be 10-60 >> (or >>>> more) Linux nodes running Spark-Streaming with 10Gb ethernet >> connections. >>>> Target data rate per the user is <200MB/sec, although I can see this >>>> scaling in the future. >>>> >>>> Based on the documentation, my initial thoughts were as follows: >>>> >>>> 3 nodes, all running ZK and the broker >>>> >>>> Dell R620 >>>> 2x8 core 2.6GHz Xeon >>>> 256GB RAM >>>> 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5) >>>> 10Gb ethernet (single port) >>>> >>>> Do these specs make sense? Am I over or under-speccing in any of the >>>> areas? It made sense to me to make the filesystem cache as large as >>>> possible, particularly when I'm dealing with a small number of brokers. >>>> >>>> Thanks, >>>> Ken Carlile >>>> Senior Unix Engineer, Scientific Computing Systems >>>> Janelia Farm Research Campus, HHMI >>>> >> >>