Hi Clive,

For now this optimisation is not present. We're working on it as part of 
KIP-63. One manual work-around might be to use a simple Key-value store to 
deduplicate the final output before sending to the backend. It could have a 
simple policy like "output all values at 1 second intervals" or "output after 
10 records have been received".

Eno


> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
> 
> 
> Thanks Eno for your comments and references.
> Perhaps, I can explain what I want to achieve and maybe you can suggest the 
> correct topology?
> I want process a stream of events and do aggregation and send to an analytics 
> backend (Influxdb), so that rather than sending 1000 points/sec to the 
> analytics backend, I send a much lower value. I'm only interested in using 
> the processing time of the event so in that respect there are no "late 
> arriving" events.I was hoping I could use a Tumbling window which when its 
> end-time had been passed I can send the consolidated aggregation for that 
> window and then throw the Window away. 
> 
> It sounds like from the references you give that this is not possible at 
> present in Kafka Streams?
> 
> Thanks,
> Clive 
> 
>    On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> 
> wrote:
> 
> 
> Hi Clive,
> 
> The behaviour you are seeing is indeed correct (though not necessarily 
> optimal in terms of performance as described in this JIRA: 
> https://issues.apache.org/jira/browse/KAFKA-3101 
> <https://issues.apache.org/jira/browse/KAFKA-3101>)
> 
> The key observation is that windows never close/complete. There could always 
> be late arriving events that appear long after a window's end interval and 
> those need to be accounted for properly. In Kafka Streams that means that 
> such late arriving events continue to update the value of the window. As 
> described in the above JIRA, some optimisations could still be possible 
> (e.g., batch requests as described in KIP-63 
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
>  however they are not implemented yet.
> 
> So your code needs to handle each update.
> 
> Thanks
> Eno
> 
> 
> 
>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>> 
>> Hi,
>>   I would like to process a stream with a tumbling window of 5secs, create 
>> aggregated stats for keys and push the final aggregates at the end of each 
>> window period to a analytics backend. I have tried doing something like:
>>   stream
>>         .map
>>         .reduceByKey(...
>>           , TimeWindows.of("mywindow", 5000L),...)
>>         .foreach        {            send stats
>>           }
>> But I get every update to the ktable in the foreach.
>> How do I just get the final values once the TumblingWindow is complete so I 
>> can iterate over them and send to some external system?
>> Thanks,
>>   Clive
>> PS Using kafka_2.10-0.10.0.0
>> 
> 
> 

Reply via email to