Ray, We are, for SPM <http://sematext.com/spm>. On c1.medium instances, I believe, we have: * Jetty receiving tens of thousands of metrics per second (in batches, so the rate of HTTP requests is lower than that number_ * Kafka brokers * ZK instances
So far we have not had issues with this. Knock on wood. Disk IO is not high, nor is the CPU usage. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Sat, Mar 15, 2014 at 11:09 AM, Ray Rodriguez <rayrod2...@gmail.com>wrote: > Imagine a situation where one of your nodes running a kafka broker and > zookeeper node goes down. You now have to contend with two distributed > systems that need to do leader election and consensus in the case of a > zookeeper ensemble and partition rebalancing/repair in the case of a kafka > cluster so I think Jun's point is that when running distributed systems try > to isolate them as much as possible from running on the same node to > achieve better fault tolerance and high availability. > > From the Kafka docs you can see that a zookeeper cluster does't need to sit > on very powerful hardware to be reliable so I believe the suggestion is to > run a small independent zookeeper cluster that will be used by kafka and by > all means don't hesitate to reuse that zookeeper ensemble for other systems > as long as you can guarantee that all the systems using the zk ensemble use > some form of znode root to keep their data seperated within the zookeeper > znode directory structure. > > This is an interesting topic and I'd love to hear if anyone else is running > their zk alongside their kafka brokers in production? > > Ray > > > On Sat, Mar 15, 2014 at 10:28 AM, Carlile, Ken <carli...@janelia.hhmi.org > >wrote: > > > I'd rather not purchase dedicated hardware for ZK if I don't absolutely > > have to, unless I can use it for multiple clusters (ie Kafka, HBase, > other > > things that rely on ZK). Would adding more cores help with ZK on the same > > machine? Or is that just a waste of cores, considering that it's java > under > > all of this? > > > > --Ken > > > > On Mar 15, 2014, at 12:07 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > The spec looks reasonable. If you have other machines, it may be better > > to > > > put ZK on its own machines. > > > > > > Thanks, > > > > > > Jun > > > > > > > > > On Fri, Mar 14, 2014 at 10:52 AM, Carlile, Ken < > > carli...@janelia.hhmi.org>wrote: > > > > > >> Hi all, > > >> > > >> I'm looking at setting up a (small) Kafka cluster for streaming > > microscope > > >> data to Spark-Streaming. > > >> > > >> The producer would be a single Windows 7 machine with a 1Gb or 10Gb > > >> ethernet connection running http posts from Matlab (this bit is a > little > > >> fuzzy, and I'm not the user, I'm an admin), the consumer would be > 10-60 > > (or > > >> more) Linux nodes running Spark-Streaming with 10Gb ethernet > > connections. > > >> Target data rate per the user is <200MB/sec, although I can see this > > >> scaling in the future. > > >> > > >> Based on the documentation, my initial thoughts were as follows: > > >> > > >> 3 nodes, all running ZK and the broker > > >> > > >> Dell R620 > > >> 2x8 core 2.6GHz Xeon > > >> 256GB RAM > > >> 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5) > > >> 10Gb ethernet (single port) > > >> > > >> Do these specs make sense? Am I over or under-speccing in any of the > > >> areas? It made sense to me to make the filesystem cache as large as > > >> possible, particularly when I'm dealing with a small number of > brokers. > > >> > > >> Thanks, > > >> Ken Carlile > > >> Senior Unix Engineer, Scientific Computing Systems > > >> Janelia Farm Research Campus, HHMI > > >> > > > > >