Hi Clive, For now this optimisation is not present. We're working on it as part of KIP-63. One manual work-around might be to use a simple Key-value store to deduplicate the final output before sending to the backend. It could have a simple policy like "output all values at 1 second intervals" or "output after 10 records have been received".
Eno > On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: > > > Thanks Eno for your comments and references. > Perhaps, I can explain what I want to achieve and maybe you can suggest the > correct topology? > I want process a stream of events and do aggregation and send to an analytics > backend (Influxdb), so that rather than sending 1000 points/sec to the > analytics backend, I send a much lower value. I'm only interested in using > the processing time of the event so in that respect there are no "late > arriving" events.I was hoping I could use a Tumbling window which when its > end-time had been passed I can send the consolidated aggregation for that > window and then throw the Window away. > > It sounds like from the references you give that this is not possible at > present in Kafka Streams? > > Thanks, > Clive > > On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> > wrote: > > > Hi Clive, > > The behaviour you are seeing is indeed correct (though not necessarily > optimal in terms of performance as described in this JIRA: > https://issues.apache.org/jira/browse/KAFKA-3101 > <https://issues.apache.org/jira/browse/KAFKA-3101>) > > The key observation is that windows never close/complete. There could always > be late arriving events that appear long after a window's end interval and > those need to be accounted for properly. In Kafka Streams that means that > such late arriving events continue to update the value of the window. As > described in the above JIRA, some optimisations could still be possible > (e.g., batch requests as described in KIP-63 > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>), > however they are not implemented yet. > > So your code needs to handle each update. > > Thanks > Eno > > > >> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: >> >> Hi, >> I would like to process a stream with a tumbling window of 5secs, create >> aggregated stats for keys and push the final aggregates at the end of each >> window period to a analytics backend. I have tried doing something like: >> stream >> .map >> .reduceByKey(... >> , TimeWindows.of("mywindow", 5000L),...) >> .foreach { send stats >> } >> But I get every update to the ktable in the foreach. >> How do I just get the final values once the TumblingWindow is complete so I >> can iterate over them and send to some external system? >> Thanks, >> Clive >> PS Using kafka_2.10-0.10.0.0 >> > >