Thanks Todd. That's the current thinking. We use multiple clusters in a
single data center for solr to avoid a similar problem - number of
collections per cluster in solr's case.
    Your numbers are encouraging. I will go ahead with this design for now.
Thanks!
Neelesh
On Mar 21, 2014 6:42 AM, "Todd Palino" <tpal...@linkedin.com> wrote:

> Filehandles can definitely be a concern here, but you can mitigate it to
> some extent by adding more brokers to the cluster. The number of open file
> handles is going to be driven in large part by the number of log files on
> disk. This, in turn, is governed by the number of partitions and how many
> files you have for each partition. That, in turn, is governed by the
> amount of data you see for each partition and your retention settings.
> It's a lot of tweaking :)
>
> So, for one example, I have a cluster with over 13k partitions on it. Up
> until recently, it was running on 5 brokers (we just added 4 more). One of
> those brokers has about 6300 log files on disk right now, and it's running
> with over 7000 open filehandles. Given that the message traffic stays the
> same, if I wanted to reduce the number of log files, I could increase the
> size of each log file. I could also decrease the retention time for the
> data. Another option is to increase the number of brokers, to spread out
> the load more evenly. So you have options to keep your filehandles
> manageable.
>
> The other side of this is whether or not the controller can efficiently
> handle that many topics and partitions in a single cluster, but that's not
> a filehandles problem. So far, I've not seen controller performance issues
> with any of our clusters, but that could potentially change if you go up
> an order of magnitude on the partition count in a single cluster. Is there
> the possibility of splitting your customers out to multiple clusters if
> that was identified as a problem?
>
> -Todd
>
> On 3/20/14 9:30 PM, "Neelesh" <neele...@gmail.com> wrote:
>
> >Hi,
> >   We are prototyping kafka + storm for our stream processing / event
> >processing needs. One of the issues we face is a huge influx of stream
> >data
> >from one of our customers. If we have a single topic for this stream for
> >all customers, other customers who are behind the big customer stream
> >would
> >starve for significant time, until their turn comes.
> >    One idea is to create a topic per customer per use case, implement a
> >fairness algorithm on top of the high level consumer using
> >*createMessageStreamsByFilter* and use that to build a  storm Spout.
> >However, this also means tens of thousands of topics and several tens of
> >thousands (even hundreds of thousands) of partitions on a single kafka
> >cluster.
> >     I remember reading that you are effectively limited by filehandles.
> >Has anyone tried such a setup ?
> >
> >Thanks!
> >-Neelesh
>
>

Reply via email to