Thanks Guozhang! I am asking primarily because I have seen Flink & Spark Streaming users boasting of millions of records / second being processed and was interested to learn where Kafka Streams / KSQL stands. This would also help a lot in capability planning for teams looking to use Kafka Streams. Is there a general rule of thumb for upper performance on a Kafka Streams app, for say, single stream to table join?
As for KIP-213... I need something else to start taking a look at while it awaits review :). Thanks, I'll look into what you posted. Adam On Wed, Aug 22, 2018 at 7:42 PM, Guozhang Wang <wangg...@gmail.com> wrote: > Hello Adam, > > Thanks for your interests in working on Kafka Streams / KSQL potential > performance improvements (I thought the non-key joining will take most of > your time :P ) > > Currently there is no published performance numbers for latest versions of > Streams AFAIK. Personally I ran the Streams SimpleBenchmark ( > https://github.com/apache/kafka/blob/trunk/tests/ > kafkatest/benchmarks/streams/streams_simple_benchmark_test.py) and profile > it if necessary trying to figure out the performance bottlenecks. If you > are interested you can follow similar approaches, there are also some JIRAs > open for potential performance improvements as well: > > https://issues.apache.org/jira/issues/?jql=project%20% > 3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In% > 20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)% > 20AND%20component%20%3D%20%22streams%22%20%20AND% > 20labels%20%3D%20performance%20%20 > > > Guozhang > > On Wed, Aug 22, 2018 at 7:02 AM, Adam Bellemare <adam.bellem...@gmail.com> > wrote: > > > Blog post in question: > > https://www.confluent.io/blog/ksql-february-release-streamin > > g-sql-for-apache-kafka/ > > > > On Wed, Aug 22, 2018 at 10:01 AM, Adam Bellemare < > adam.bellem...@gmail.com > > > > > wrote: > > > > > Hi All > > > > > > I am looking for performance metrics related to Kafka Streams and > KSQL. I > > > have been scouring various blogs, including the confluent one, looking > > for > > > any current performance metrics or benchmarks, official or otherwise, > on > > > both Kafka Streams and KSQL for Kafka 2.x +. Unfortunately, almost > > > everything I am finding is 0.x. > > > > > > In this particular blog post on KSQL, there is the following quotation: > > > > > > > For example, our soak testing cluster has racked up over 1,000 hours > > > and runs KSQL workloads 24×7. The performance tests we conduct allow us > > to > > > understand performance characteristics of stateless and stateful KSQL > > > queries. We currently run over 42 different tests that collect more > than > > > 700 metrics. > > > > > > I assume that there is also some information related to Kafka Streams > in > > > similar tests. Does anyone know where I can find these results? Or does > > > anyone have any blog posts or other materials that look at the > > performance > > > of either one of these for Kafka 2.x ? > > > > > > For context, I am asking this question to get a better understanding of > > > current Kafka Streams / KSQL performance, such that contributors can > > > understand the prioritization of performance-related improvements vs. > > > feature-related improvements. > > > > > > Thanks > > > Adam > > > > > > > > > -- > -- Guozhang >