Re: Kafka Streams vs Spark Streaming : reduce by window

Paolo Patierno Thu, 15 Jun 2017 06:58:42 -0700

Hi Emo,


thanks for the reply !

Regarding the cache I'm already using CACHE_MAX_BYTES_BUFFERING_CONFIG = 0 (so 
disabling cache).

Regarding the interactive query API (I'll take a look) it means that it's up to 
the application doing something like we have oob with Spark.

May I ask what do you mean with "We don’t believe in closing windows" ? Isn't 
it much more code that user has to write for having the same result ?

I'm exploring Kafka Streams and it's very powerful imho even because the usage 
is pretty simple but this scenario could have a lack against Spark.


Thanks,

Paolo.


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno<http://twitter.com/ppatierno>
Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
Blog : DevExperience<http://paolopatierno.wordpress.com/>


________________________________
From: Eno Thereska <eno.there...@gmail.com>
Sent: Thursday, June 15, 2017 1:45 PM
To: users@kafka.apache.org
Subject: Re: Kafka Streams vs Spark Streaming : reduce by window

Hi Paolo,

That is indeed correct. We don’t believe in closing windows in Kafka Streams.
You could reduce the number of downstream records by using record caches: 
http://docs.confluent.io/current/streams/developer-guide.html#record-caches-in-the-dsl
 
<http://docs.confluent.io/current/streams/developer-guide.html#record-caches-in-the-dsl>.

Alternatively you can just query the KTable whenever you want using the 
Interactive Query APIs (so when you query dictates what  data you receive), see 
this 
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
 
<https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/>

Thanks
Eno
> On Jun 15, 2017, at 2:38 PM, Paolo Patierno <ppatie...@live.com> wrote:
>
> Hi,
>
>
> using the streams library I noticed a difference (or there is a lack of 
> knowledge on my side)with Apache Spark.
>
> Imagine following scenario ...
>
>
> I have a source topic where numeric values come in and I want to check the 
> maximum value in the latest 5 seconds but ... putting the max value into a 
> destination topic every 5 seconds.
>
> This is what happens with reduceByWindow method in Spark.
>
> I'm using reduce on a KStream here that process the max value taking into 
> account previous values in the latest 5 seconds but the final value is put 
> into the destination topic for each incoming value.
>
>
> For example ...
>
>
> An application sends numeric values every 1 second.
>
> With Spark ... the source gets values every 1 second, process max in a window 
> of 5 seconds, puts the max into the destination every 5 seconds (so when the 
> window ends). If the sequence is 21, 25, 22, 20, 26 the output will be just 
> 26.
>
> With Kafka Streams ... the source gets values every 1 second, process max in 
> a window of 5 seconds, puts the max into the destination every 1 seconds (so 
> every time an incoming value arrives). Of course, if for example the sequence 
> is 21, 25, 22, 20, 26 ... the output will be 21, 25, 25, 25, 26.
>
>
> Is it possible with Kafka Streams ? Or it's something to do at application 
> level ?
>
>
> Thanks,
>
> Paolo
>
>
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
>
> Twitter : @ppatierno<http://twitter.com/ppatierno>
> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> Blog : DevExperience<http://paolopatierno.wordpress.com/>

Re: Kafka Streams vs Spark Streaming : reduce by window

Reply via email to