Hi Clive, We are working on exposing the state store behind a KTable as part of allowing for queries to the structures currently hidden behind the language (DSL). The KIP should be out today or tomorrow for you to have a look. You can probably do what you need using the low-level processor API but then you'd lose the benefits of the DSL and would have to maintain your own structures.
Thanks, Eno > On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: > > Following on from this thread, if I want to iterate over a KTable at the end > of its hopping/tumbling Time Window how can I do this at present myself? Is > there a way to access these structures? > If this is not possible it would seem I need to duplicate and manage > something similar to a list of windowed KTables myself which is not really > ideal. > Thanks for any help, > Clive > > > On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> > wrote: > > > Hi Clive, > > For now this optimisation is not present. We're working on it as part of > KIP-63. One manual work-around might be to use a simple Key-value store to > deduplicate the final output before sending to the backend. It could have a > simple policy like "output all values at 1 second intervals" or "output after > 10 records have been received". > > Eno > > >> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: >> >> >> Thanks Eno for your comments and references. >> Perhaps, I can explain what I want to achieve and maybe you can suggest the >> correct topology? >> I want process a stream of events and do aggregation and send to an >> analytics backend (Influxdb), so that rather than sending 1000 points/sec to >> the analytics backend, I send a much lower value. I'm only interested in >> using the processing time of the event so in that respect there are no "late >> arriving" events.I was hoping I could use a Tumbling window which when its >> end-time had been passed I can send the consolidated aggregation for that >> window and then throw the Window away. >> >> It sounds like from the references you give that this is not possible at >> present in Kafka Streams? >> >> Thanks, >> Clive >> >> On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> >> wrote: >> >> >> Hi Clive, >> >> The behaviour you are seeing is indeed correct (though not necessarily >> optimal in terms of performance as described in this JIRA: >> https://issues.apache.org/jira/browse/KAFKA-3101 >> <https://issues.apache.org/jira/browse/KAFKA-3101>) >> >> The key observation is that windows never close/complete. There could always >> be late arriving events that appear long after a window's end interval and >> those need to be accounted for properly. In Kafka Streams that means that >> such late arriving events continue to update the value of the window. As >> described in the above JIRA, some optimisations could still be possible >> (e.g., batch requests as described in KIP-63 >> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>), >> however they are not implemented yet. >> >> So your code needs to handle each update. >> >> Thanks >> Eno >> >> >> >>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: >>> >>> Hi, >>> I would like to process a stream with a tumbling window of 5secs, create >>> aggregated stats for keys and push the final aggregates at the end of each >>> window period to a analytics backend. I have tried doing something like: >>> stream >>> .map >>> .reduceByKey(... >>> , TimeWindows.of("mywindow", 5000L),...) >>> .foreach { send stats >>> } >>> But I get every update to the ktable in the foreach. >>> How do I just get the final values once the TumblingWindow is complete so I >>> can iterate over them and send to some external system? >>> Thanks, >>> Clive >>> PS Using kafka_2.10-0.10.0.0 >>> >> >> > >