Hi Eno, I've looked at KIP-67. It looks good but its not clear what calls I would make to do what I presently need: Get access to each windowed store at some time soon after window end time. I can then use the methods specified to iterate over keys and values. Can you point me to the relevant method/technique for this?
Thanks, Clive On Tuesday, 28 June 2016, 12:47, Eno Thereska <eno.there...@gmail.com> wrote: Hi Clive, As promised, here is the link to the KIP that just went out today. Feedback welcome: https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams <https://cwiki.apache.org/confluence/display/KAFKA/KIP-67:+Queryable+state+for+Kafka+Streams> Thanks Eno > On 27 Jun 2016, at 20:56, Eno Thereska <eno.there...@gmail.com> wrote: > > Hi Clive, > > We are working on exposing the state store behind a KTable as part of > allowing for queries to the structures currently hidden behind the language > (DSL). The KIP should be out today or tomorrow for you to have a look. You > can probably do what you need using the low-level processor API but then > you'd lose the benefits of the DSL and would have to maintain your own > structures. > > Thanks, > Eno > >> On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: >> >> Following on from this thread, if I want to iterate over a KTable at the end >> of its hopping/tumbling Time Window how can I do this at present myself? Is >> there a way to access these structures? >> If this is not possible it would seem I need to duplicate and manage >> something similar to a list of windowed KTables myself which is not really >> ideal. >> Thanks for any help, >> Clive >> >> >> On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> wrote: >> >> >> Hi Clive, >> >> For now this optimisation is not present. We're working on it as part of >> KIP-63. One manual work-around might be to use a simple Key-value store to >> deduplicate the final output before sending to the backend. It could have a >> simple policy like "output all values at 1 second intervals" or "output >> after 10 records have been received". >> >> Eno >> >> >>> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: >>> >>> >>> Thanks Eno for your comments and references. >>> Perhaps, I can explain what I want to achieve and maybe you can suggest the >>> correct topology? >>> I want process a stream of events and do aggregation and send to an >>> analytics backend (Influxdb), so that rather than sending 1000 points/sec >>> to the analytics backend, I send a much lower value. I'm only interested in >>> using the processing time of the event so in that respect there are no >>> "late arriving" events.I was hoping I could use a Tumbling window which >>> when its end-time had been passed I can send the consolidated aggregation >>> for that window and then throw the Window away. >>> >>> It sounds like from the references you give that this is not possible at >>> present in Kafka Streams? >>> >>> Thanks, >>> Clive >>> >>> On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> >>>wrote: >>> >>> >>> Hi Clive, >>> >>> The behaviour you are seeing is indeed correct (though not necessarily >>> optimal in terms of performance as described in this JIRA: >>> https://issues.apache.org/jira/browse/KAFKA-3101 >>> <https://issues.apache.org/jira/browse/KAFKA-3101>) >>> >>> The key observation is that windows never close/complete. There could >>> always be late arriving events that appear long after a window's end >>> interval and those need to be accounted for properly. In Kafka Streams that >>> means that such late arriving events continue to update the value of the >>> window. As described in the above JIRA, some optimisations could still be >>> possible (e.g., batch requests as described in KIP-63 >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>), >>> however they are not implemented yet. >>> >>> So your code needs to handle each update. >>> >>> Thanks >>> Eno >>> >>> >>> >>>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote: >>>> >>>> Hi, >>>> I would like to process a stream with a tumbling window of 5secs, create >>>>aggregated stats for keys and push the final aggregates at the end of each >>>>window period to a analytics backend. I have tried doing something like: >>>> stream >>>> .map >>>> .reduceByKey(... >>>> , TimeWindows.of("mywindow", 5000L),...) >>>> .foreach { send stats >>>> } >>>> But I get every update to the ktable in the foreach. >>>> How do I just get the final values once the TumblingWindow is complete so >>>> I can iterate over them and send to some external system? >>>> Thanks, >>>> Clive >>>> PS Using kafka_2.10-0.10.0.0 >>>> >>> >>> >> >> >