Re: Kafka (Streams) scalability

Guozhang Wang Tue, 26 Jul 2016 17:18:17 -0700

As for KS, you can think of each single-threaded KS instance as a normal
producer plus a normal consumer client, so you can do the math for capacity
planning purposes assuming you understand your application traffic, and
your state store write amplifications (if you use the default persistent
key-value store for example, you can configure its configs such as block
size from streams which can fine tune the amplifications).


The application may auto-create some changelog / repartition topics on the
broker, but since it is per-application only the additional number of
topics should not be an issue.


Guozhang


On Sat, Jul 23, 2016 at 3:03 AM, Jagat Singh <jagatsi...@gmail.com> wrote:

> My post if not directly referring to KS.
>
> The new free book by Orielly has very good explanation about Kafka Topic
> counts.
>
> You can download it from below link ( See Chapter 4)
>
> http://shop.oreilly.com/product/0636920049463.do
>
> In short quoting from there
>
> >>> These problems are likely substantially tied to the fundamental
> implementation decisions that underpin how Kafka works. In particular, as
> the number of topics increases, the amount of random I/O that is imposed on
> the broker increases dramatically because each topic partition write is
> essentially a separate file append operation. This becomes more and more
> problematic as the number of partitions increases and is very difficult to
> fix without Kafka taking over the scheduling of I/O. Just above the current
> limits on number of partitions, there are likely other limits waiting, some
> fairly serious. In particular, the number of file descriptors that a single
> process can open is typically limited.
>
> Kafka can store offsets in both Kafka and in Zookeeper. In newer releases
> it is recommended store consumer offsets inside Kafka.
>
> From high producer throughput see the properties of batch.size and
> linger.ms
> On Consumer side fetch.min.bytes and max.partition.fetch.bytes
>
>
>
> On 23 July 2016 at 16:40, Alex Glikson <glik...@il.ibm.com> wrote:
>
> > Hi all,
> >
> > I wonder whether limitations mentioned in [1] regarding Kafka scalability
> > in number of topics are still valid. For example, did the recent changes
> > in the design around usage of ZooKeeper versus internal membership
> > protocol affected the scalability - one way or the other?
> > Also, it seems that the introduction of Kafka Streams may increase the
> > number of topics (including those created by the app, and those created
> > internally by KS), and maybe also change a bit the usage pattern in
> > general (if large portion of the load is generated by KS). Are there any
> > performance numbers (e.g., for table-based APIs), known bottlenecks
> > (specific to KS), tuning recommendations?
> >
> > Thanks,
> > Alex
> >
> >
> > [1] https://www.quora.com/How-many-topics-can-be-created-in-Apache-Kafka
> >
> >
>



-- 
-- Guozhang

Re: Kafka (Streams) scalability

Reply via email to