Spark Structured Streaming has some significant limitations compared to
Kafka Streams.
This one has always proved hard to overcome:
"Multiple streaming aggregations (i.e. a chain of aggregations on a
streaming DF) are not yet supported on streaming Datasets."
On Thu, 29 Apr. 2021, 8:13 am Pa
Matthias,
I will create a KIP or ticket for tracking this issue.
-thanks
Mohan
On 4/28/21, 1:01 PM, "Matthias J. Sax" wrote:
Feel free to do a KIP and contribute to Kafka!
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
Or create a ticket
Hi,
"I'd assume this is because Kafka Streams is positioned for
building streaming applications, rather than doing analytics, whereas Spark
is more often used for analytics purposes."
Well not necessarily the full picture. Spark can do both analytics and
streaming, especially with Spark Structur
> I am not sure I understand. We have built several analytics applications.
We typically use custom aggregations as they are not available directly in
the library.
Oh for sure! I was answering this question:
> . Is there any reason why it is not provided as part of the library ?
And assuming tha
Feel free to do a KIP and contribute to Kafka!
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
Or create a ticket for tracking.
-Matthias
On 4/28/21 12:49 PM, Parthasarathy, Mohan wrote:
> Andrew,
>
> I am not sure I understand. We have built several analytics ap
Andrew,
I am not sure I understand. We have built several analytics applications. We
typically use custom aggregations as they are not available directly in the
library.
-mohan
On 4/28/21, 12:12 PM, "Andrew Otto" wrote:
I'd assume this is because Kafka Streams is positioned for buildi
Matthias,
Once a Spark dataframe is created by reading the data from Kafka
(https://sparkbyexamples.com/spark/spark-streaming-with-kafka/) , you can use
Spark SQL and all the aggregations that are shown in this page are valid. I
feel that having this built into Kafka streams library would make
I'd assume this is because Kafka Streams is positioned for building
streaming applications, rather than doing analytics, whereas Spark is more
often used for analytics purposes.
I am not familiar with all the details about Spark, however, the link
you shared is for Spark SQL. I thought Spark SQL is for batch processing
only?
Personally, I would be open to add more built-in aggregations next to
count(). It did not come up in the community so far, so there was no
investment
Hi,
Whenever the discussion about what streaming framework to use for near-realtime
analytics, there is normally a discussion about Spark vs Kafka streaming. One
of the points in favor of Spark streaming is the simple aggregations that are
built-in. See here:
https://sparkbyexamples.com/spark/
10 matches
Mail list logo