Re: Hardware planning

Otis Gospodnetic Wed, 19 Mar 2014 21:25:26 -0700

Ray,

We are, for SPM <http://sematext.com/spm>.  On c1.medium instances, I
believe, we have:
* Jetty receiving tens of thousands of metrics per second (in batches, so
the rate of HTTP requests is lower than that number_
* Kafka brokers
* ZK instances


So far we have not had issues with this. Knock on wood.  Disk IO is not
high, nor is the CPU usage.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Sat, Mar 15, 2014 at 11:09 AM, Ray Rodriguez <rayrod2...@gmail.com>wrote:

> Imagine a situation where one of your nodes running a kafka broker and
> zookeeper node goes down.  You now have to contend with two distributed
> systems that need to do leader election and consensus in the case of a
> zookeeper ensemble and partition rebalancing/repair in the case of a kafka
> cluster so I think Jun's point is that when running distributed systems try
> to isolate them as much as possible from running on the same node to
> achieve better fault tolerance and high availability.
>
> From the Kafka docs you can see that a zookeeper cluster does't need to sit
> on very powerful hardware to be reliable so I believe the suggestion is to
> run a small independent zookeeper cluster that will be used by kafka and by
> all means don't hesitate to reuse that zookeeper ensemble for other systems
> as long as you can guarantee that all the systems using the zk ensemble use
> some form of znode root to keep their data seperated within the zookeeper
> znode directory structure.
>
> This is an interesting topic and I'd love to hear if anyone else is running
> their zk alongside their kafka brokers in production?
>
> Ray
>
>
> On Sat, Mar 15, 2014 at 10:28 AM, Carlile, Ken <carli...@janelia.hhmi.org
> >wrote:
>
> > I'd rather not purchase dedicated hardware for ZK if I don't absolutely
> > have to, unless I can use it for multiple clusters (ie Kafka, HBase,
> other
> > things that rely on ZK). Would adding more cores help with ZK on the same
> > machine? Or is that just a waste of cores, considering that it's java
> under
> > all of this?
> >
> > --Ken
> >
> > On Mar 15, 2014, at 12:07 AM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > The spec looks reasonable. If you have other machines, it may be better
> > to
> > > put ZK on its own machines.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Mar 14, 2014 at 10:52 AM, Carlile, Ken <
> > carli...@janelia.hhmi.org>wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'm looking at setting up a (small) Kafka cluster for streaming
> > microscope
> > >> data to Spark-Streaming.
> > >>
> > >> The producer would be a single Windows 7 machine with a 1Gb or 10Gb
> > >> ethernet connection running http posts from Matlab (this bit is a
> little
> > >> fuzzy, and I'm not the user, I'm an admin), the consumer would be
> 10-60
> > (or
> > >> more) Linux nodes running Spark-Streaming with 10Gb ethernet
> > connections.
> > >> Target data rate per the user is <200MB/sec, although I can see this
> > >> scaling in the future.
> > >>
> > >> Based on the documentation, my initial thoughts were as follows:
> > >>
> > >> 3 nodes, all running ZK and the broker
> > >>
> > >> Dell R620
> > >> 2x8 core 2.6GHz Xeon
> > >> 256GB RAM
> > >> 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5)
> > >> 10Gb ethernet (single port)
> > >>
> > >> Do these specs make sense? Am I over or under-speccing in any of the
> > >> areas? It made sense to me to make the filesystem cache as large as
> > >> possible, particularly when I'm dealing with a small number of
> brokers.
> > >>
> > >> Thanks,
> > >> Ken Carlile
> > >> Senior Unix Engineer, Scientific Computing Systems
> > >> Janelia Farm Research Campus, HHMI
> > >>
> >
> >
>

Re: Hardware planning

Reply via email to