Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Liam Clarke-Hutchinson
Spark Structured Streaming has some significant limitations compared to Kafka Streams. This one has always proved hard to overcome: "Multiple streaming aggregations (i.e. a chain of aggregations on a streaming DF) are not yet supported on streaming Datasets." On Thu, 29 Apr. 2021, 8:13 am Pa

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Matthias, I will create a KIP or ticket for tracking this issue. -thanks Mohan On 4/28/21, 1:01 PM, "Matthias J. Sax" wrote: Feel free to do a KIP and contribute to Kafka! https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Or create a ticket

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Mich Talebzadeh
Hi, "I'd assume this is because Kafka Streams is positioned for building streaming applications, rather than doing analytics, whereas Spark is more often used for analytics purposes." Well not necessarily the full picture. Spark can do both analytics and streaming, especially with Spark Structur

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Andrew Otto
> I am not sure I understand. We have built several analytics applications. We typically use custom aggregations as they are not available directly in the library. Oh for sure! I was answering this question: > . Is there any reason why it is not provided as part of the library ? And assuming tha

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Matthias J. Sax
Feel free to do a KIP and contribute to Kafka! https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Or create a ticket for tracking. -Matthias On 4/28/21 12:49 PM, Parthasarathy, Mohan wrote: > Andrew, > > I am not sure I understand. We have built several analytics ap

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Andrew, I am not sure I understand. We have built several analytics applications. We typically use custom aggregations as they are not available directly in the library. -mohan On 4/28/21, 12:12 PM, "Andrew Otto" wrote: I'd assume this is because Kafka Streams is positioned for buildi

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Matthias, Once a Spark dataframe is created by reading the data from Kafka (https://sparkbyexamples.com/spark/spark-streaming-with-kafka/) , you can use Spark SQL and all the aggregations that are shown in this page are valid. I feel that having this built into Kafka streams library would make

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Andrew Otto
I'd assume this is because Kafka Streams is positioned for building streaming applications, rather than doing analytics, whereas Spark is more often used for analytics purposes.

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Matthias J. Sax
I am not familiar with all the details about Spark, however, the link you shared is for Spark SQL. I thought Spark SQL is for batch processing only? Personally, I would be open to add more built-in aggregations next to count(). It did not come up in the community so far, so there was no investment

Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Hi, Whenever the discussion about what streaming framework to use for near-realtime analytics, there is normally a discussion about Spark vs Kafka streaming. One of the points in favor of Spark streaming is the simple aggregations that are built-in. See here: https://sparkbyexamples.com/spark/