Hey, you making some wrong assumptions here. Kafka Streams is in no way single threaded or limited to one physical instance. Having connectivity issues to your brokers is IMO a problem with the deployment and not at all with how kafka streams is designed and works.
Kafka Streams moves hundreds of GB per day for us. Hope this helps. Best Jan On 29.11.2017 15:10, Adrienne Kole wrote:
Hi, The purpose of this email is to get overall intuition for the future plans of streams library. The main question is that, will it be a single threaded application in the long run and serve microservices use-cases, or are there any plans to extend it to multi-node execution framework with less kafka dependency. Currently, each streams node 'talks' with kafka cluster and they can indirectly talk with each other again through kafka. However, especially if kafka is not in the same network with streams nodes (actually this can happen if they are in the same network as well) this will cause high network overhead and inefficiency. One solution for this (bypassing network overhead) is to deploy streams node on kafka cluster to ensure the data locality. However, this is not recommended as the library and kafka can affect each other's performance and streams does not necessarily have to know the internal data partitioning of kafka. Another solution would be extending streams library to have a common runtime. IMO, preserving the current selling points of streams (like dynamic scale in/out) with this kind of extensions can be very good improvement. So my question is that, will streams in the long/short run, will extend its use-cases to massive and efficient stream processing (and compete with spark) or stay and strengthen its current position? Cheers, Adrienne