Hi Clive,

The behaviour you are seeing is indeed correct (though not necessarily optimal 
in terms of performance as described in this JIRA: 
https://issues.apache.org/jira/browse/KAFKA-3101 
<https://issues.apache.org/jira/browse/KAFKA-3101>)

The key observation is that windows never close/complete. There could always be 
late arriving events that appear long after a window's end interval and those 
need to be accounted for properly. In Kafka Streams that means that such late 
arriving events continue to update the value of the window. As described in the 
above JIRA, some optimisations could still be possible (e.g., batch requests as 
described in KIP-63 
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
 however they are not implemented yet.

So your code needs to handle each update.

Thanks
Eno



> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
> 
> Hi,
>  I would like to process a stream with a tumbling window of 5secs, create 
> aggregated stats for keys and push the final aggregates at the end of each 
> window period to a analytics backend. I have tried doing something like:
>   stream
>         .map
>         .reduceByKey(...
>           , TimeWindows.of("mywindow", 5000L),...)
>         .foreach         {            send stats
>          }
> But I get every update to the ktable in the foreach.
> How do I just get the final values once the TumblingWindow is complete so I 
> can iterate over them and send to some external system?
> Thanks,
>  Clive
> PS Using kafka_2.10-0.10.0.0
> 

Reply via email to