OK, I understand. So for the Zookeeper cluster, can I go with something like: 

3 x Dell R320: 
Single hexcore 2.5GHz Xeon, 32GB RAM, 4x10K 300GB SAS drives, 10GbE

and if I do, can I drop the CPU specs on the broker machines to say, dual 6 
cores? Or are we looking at something that is core bound here? 

Thanks, 
Ken

On Mar 15, 2014, at 11:09 AM, Ray Rodriguez <rayrod2...@gmail.com> wrote:

> Imagine a situation where one of your nodes running a kafka broker and
> zookeeper node goes down.  You now have to contend with two distributed
> systems that need to do leader election and consensus in the case of a
> zookeeper ensemble and partition rebalancing/repair in the case of a kafka
> cluster so I think Jun's point is that when running distributed systems try
> to isolate them as much as possible from running on the same node to
> achieve better fault tolerance and high availability.
> 
> From the Kafka docs you can see that a zookeeper cluster does't need to sit
> on very powerful hardware to be reliable so I believe the suggestion is to
> run a small independent zookeeper cluster that will be used by kafka and by
> all means don't hesitate to reuse that zookeeper ensemble for other systems
> as long as you can guarantee that all the systems using the zk ensemble use
> some form of znode root to keep their data seperated within the zookeeper
> znode directory structure.
> 
> This is an interesting topic and I'd love to hear if anyone else is running
> their zk alongside their kafka brokers in production?
> 
> Ray
> 
> 
> On Sat, Mar 15, 2014 at 10:28 AM, Carlile, Ken 
> <carli...@janelia.hhmi.org>wrote:
> 
>> I'd rather not purchase dedicated hardware for ZK if I don't absolutely
>> have to, unless I can use it for multiple clusters (ie Kafka, HBase, other
>> things that rely on ZK). Would adding more cores help with ZK on the same
>> machine? Or is that just a waste of cores, considering that it's java under
>> all of this?
>> 
>> --Ken
>> 
>> On Mar 15, 2014, at 12:07 AM, Jun Rao <jun...@gmail.com> wrote:
>> 
>>> The spec looks reasonable. If you have other machines, it may be better
>> to
>>> put ZK on its own machines.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>> On Fri, Mar 14, 2014 at 10:52 AM, Carlile, Ken <
>> carli...@janelia.hhmi.org>wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I'm looking at setting up a (small) Kafka cluster for streaming
>> microscope
>>>> data to Spark-Streaming.
>>>> 
>>>> The producer would be a single Windows 7 machine with a 1Gb or 10Gb
>>>> ethernet connection running http posts from Matlab (this bit is a little
>>>> fuzzy, and I'm not the user, I'm an admin), the consumer would be 10-60
>> (or
>>>> more) Linux nodes running Spark-Streaming with 10Gb ethernet
>> connections.
>>>> Target data rate per the user is <200MB/sec, although I can see this
>>>> scaling in the future.
>>>> 
>>>> Based on the documentation, my initial thoughts were as follows:
>>>> 
>>>> 3 nodes, all running ZK and the broker
>>>> 
>>>> Dell R620
>>>> 2x8 core 2.6GHz Xeon
>>>> 256GB RAM
>>>> 8x300GB 15K SAS drives (OS runs on 2, ZK on 1, broker on the last 5)
>>>> 10Gb ethernet (single port)
>>>> 
>>>> Do these specs make sense? Am I over or under-speccing in any of the
>>>> areas? It made sense to me to make the filesystem cache as large as
>>>> possible, particularly when I'm dealing with a small number of brokers.
>>>> 
>>>> Thanks,
>>>> Ken Carlile
>>>> Senior Unix Engineer, Scientific Computing Systems
>>>> Janelia Farm Research Campus, HHMI
>>>> 
>> 
>> 

Reply via email to