Eno, I was also looking for something similar. To output aggregate value once the window is "complete". I'm not sure getting individual update for an aggregate operator is that useful.
With KIP-67, will we have access to Windowed[key]( key + timestamp) and value? Does until() clear this store when time passes? Srikanth On Thu, Jun 30, 2016 at 4:27 AM, Clive Cox <clivej...@yahoo.co.uk.invalid> wrote: > Hi Eno, > I've looked at KIP-67. It looks good but its not clear what calls I would > make to do what I presently need: Get access to each windowed store at some > time soon after window end time. I can then use the methods specified to > iterate over keys and values. Can you point me to the relevant > method/technique for this? > > Thanks, > Clive > > > On Tuesday, 28 June 2016, 12:47, Eno Thereska <eno.there...@gmail.com> > wrote: > > > Hi Clive, > > As promised, here is the link to the KIP that just went out today. > Feedback welcome: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-67:+Queryable+state+for+Kafka+Streams > > > > Thanks > Eno > > > On 27 Jun 2016, at 20:56, Eno Thereska <eno.there...@gmail.com> wrote: > > > > Hi Clive, > > > > We are working on exposing the state store behind a KTable as part of > allowing for queries to the structures currently hidden behind the language > (DSL). The KIP should be out today or tomorrow for you to have a look. You > can probably do what you need using the low-level processor API but then > you'd lose the benefits of the DSL and would have to maintain your own > structures. > > > > Thanks, > > Eno > > > >> On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID> > wrote: > >> > >> Following on from this thread, if I want to iterate over a KTable at > the end of its hopping/tumbling Time Window how can I do this at present > myself? Is there a way to access these structures? > >> If this is not possible it would seem I need to duplicate and manage > something similar to a list of windowed KTables myself which is not really > ideal. > >> Thanks for any help, > >> Clive > >> > >> > >> On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> > wrote: > >> > >> > >> Hi Clive, > >> > >> For now this optimisation is not present. We're working on it as part > of KIP-63. One manual work-around might be to use a simple Key-value store > to deduplicate the final output before sending to the backend. It could > have a simple policy like "output all values at 1 second intervals" or > "output after 10 records have been received". > >> > >> Eno > >> > >> > >>> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> > wrote: > >>> > >>> > >>> Thanks Eno for your comments and references. > >>> Perhaps, I can explain what I want to achieve and maybe you can > suggest the correct topology? > >>> I want process a stream of events and do aggregation and send to an > analytics backend (Influxdb), so that rather than sending 1000 points/sec > to the analytics backend, I send a much lower value. I'm only interested in > using the processing time of the event so in that respect there are no > "late arriving" events.I was hoping I could use a Tumbling window which > when its end-time had been passed I can send the consolidated aggregation > for that window and then throw the Window away. > >>> > >>> It sounds like from the references you give that this is not possible > at present in Kafka Streams? > >>> > >>> Thanks, > >>> Clive > >>> > >>> On Monday, 13 June 2016, 11:32, Eno Thereska < > eno.there...@gmail.com> wrote: > >>> > >>> > >>> Hi Clive, > >>> > >>> The behaviour you are seeing is indeed correct (though not necessarily > optimal in terms of performance as described in this JIRA: > https://issues.apache.org/jira/browse/KAFKA-3101 < > https://issues.apache.org/jira/browse/KAFKA-3101>) > >>> > >>> The key observation is that windows never close/complete. There could > always be late arriving events that appear long after a window's end > interval and those need to be accounted for properly. In Kafka Streams that > means that such late arriving events continue to update the value of the > window. As described in the above JIRA, some optimisations could still be > possible (e.g., batch requests as described in KIP-63 < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>), > however they are not implemented yet. > >>> > >>> So your code needs to handle each update. > >>> > >>> Thanks > >>> Eno > >>> > >>> > >>> > >>>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> > wrote: > >>>> > >>>> Hi, > >>>> I would like to process a stream with a tumbling window of 5secs, > create aggregated stats for keys and push the final aggregates at the end > of each window period to a analytics backend. I have tried doing something > like: > >>>> stream > >>>> .map > >>>> .reduceByKey(... > >>>> , TimeWindows.of("mywindow", 5000L),...) > >>>> .foreach { send stats > >>>> } > >>>> But I get every update to the ktable in the foreach. > >>>> How do I just get the final values once the TumblingWindow is > complete so I can iterate over them and send to some external system? > >>>> Thanks, > >>>> Clive > >>>> PS Using kafka_2.10-0.10.0.0 > >>>> > >>> > >>> > >> > >> > > > > > >