Thanks Eno for your comments and references.
Perhaps, I can explain what I want to achieve and maybe you can suggest the 
correct topology?
I want process a stream of events and do aggregation and send to an analytics 
backend (Influxdb), so that rather than sending 1000 points/sec to the 
analytics backend, I send a much lower value. I'm only interested in using the 
processing time of the event so in that respect there are no "late arriving" 
events.I was hoping I could use a Tumbling window which when its end-time had 
been passed I can send the consolidated aggregation for that window and then 
throw the Window away. 

It sounds like from the references you give that this is not possible at 
present in Kafka Streams?

Thanks,
Clive 

    On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> wrote:
 

 Hi Clive,

The behaviour you are seeing is indeed correct (though not necessarily optimal 
in terms of performance as described in this JIRA: 
https://issues.apache.org/jira/browse/KAFKA-3101 
<https://issues.apache.org/jira/browse/KAFKA-3101>)

The key observation is that windows never close/complete. There could always be 
late arriving events that appear long after a window's end interval and those 
need to be accounted for properly. In Kafka Streams that means that such late 
arriving events continue to update the value of the window. As described in the 
above JIRA, some optimisations could still be possible (e.g., batch requests as 
described in KIP-63 
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
 however they are not implemented yet.

So your code needs to handle each update.

Thanks
Eno



> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
> 
> Hi,
>  I would like to process a stream with a tumbling window of 5secs, create 
>aggregated stats for keys and push the final aggregates at the end of each 
>window period to a analytics backend. I have tried doing something like:
>  stream
>        .map
>        .reduceByKey(...
>          , TimeWindows.of("mywindow", 5000L),...)
>        .foreach        {            send stats
>          }
> But I get every update to the ktable in the foreach.
> How do I just get the final values once the TumblingWindow is complete so I 
> can iterate over them and send to some external system?
> Thanks,
>  Clive
> PS Using kafka_2.10-0.10.0.0
> 


  

Reply via email to