Eno,

I was also looking for something similar. To output aggregate value once
the window is "complete".
I'm not sure getting individual update for an aggregate operator is that
useful.

With KIP-67, will we have access to Windowed[key]( key + timestamp) and
value?
Does until() clear this store when time passes?

Srikanth

On Thu, Jun 30, 2016 at 4:27 AM, Clive Cox <clivej...@yahoo.co.uk.invalid>
wrote:

> Hi Eno,
> I've looked at KIP-67. It looks good but its not clear what calls I would
> make to do what I presently need: Get access to each windowed store at some
> time soon after window end time. I can then use the methods specified to
> iterate over keys and values. Can you point me to the relevant
> method/technique for this?
>
> Thanks,
> Clive
>
>
>     On Tuesday, 28 June 2016, 12:47, Eno Thereska <eno.there...@gmail.com>
> wrote:
>
>
>  Hi Clive,
>
> As promised, here is the link to the KIP that just went out today.
> Feedback welcome:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67:+Queryable+state+for+Kafka+Streams
> >
>
> Thanks
> Eno
>
> > On 27 Jun 2016, at 20:56, Eno Thereska <eno.there...@gmail.com> wrote:
> >
> > Hi Clive,
> >
> > We are working on exposing the state store behind a KTable as part of
> allowing for queries to the structures currently hidden behind the language
> (DSL). The KIP should be out today or tomorrow for you to have a look. You
> can probably do what you need using the low-level processor API but then
> you'd lose the benefits of the DSL and would have to maintain your own
> structures.
> >
> > Thanks,
> > Eno
> >
> >> On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID>
> wrote:
> >>
> >> Following on from this thread, if I want to iterate over a KTable at
> the end of its hopping/tumbling Time Window how can I do this at present
> myself? Is there a way to access these structures?
> >> If this is not possible it would seem I need to duplicate and manage
> something similar to a list of windowed KTables myself which is not really
> ideal.
> >> Thanks for any help,
> >> Clive
> >>
> >>
> >>  On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com>
> wrote:
> >>
> >>
> >> Hi Clive,
> >>
> >> For now this optimisation is not present. We're working on it as part
> of KIP-63. One manual work-around might be to use a simple Key-value store
> to deduplicate the final output before sending to the backend. It could
> have a simple policy like "output all values at 1 second intervals" or
> "output after 10 records have been received".
> >>
> >> Eno
> >>
> >>
> >>> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID>
> wrote:
> >>>
> >>>
> >>> Thanks Eno for your comments and references.
> >>> Perhaps, I can explain what I want to achieve and maybe you can
> suggest the correct topology?
> >>> I want process a stream of events and do aggregation and send to an
> analytics backend (Influxdb), so that rather than sending 1000 points/sec
> to the analytics backend, I send a much lower value. I'm only interested in
> using the processing time of the event so in that respect there are no
> "late arriving" events.I was hoping I could use a Tumbling window which
> when its end-time had been passed I can send the consolidated aggregation
> for that window and then throw the Window away.
> >>>
> >>> It sounds like from the references you give that this is not possible
> at present in Kafka Streams?
> >>>
> >>> Thanks,
> >>> Clive
> >>>
> >>>    On Monday, 13 June 2016, 11:32, Eno Thereska <
> eno.there...@gmail.com> wrote:
> >>>
> >>>
> >>> Hi Clive,
> >>>
> >>> The behaviour you are seeing is indeed correct (though not necessarily
> optimal in terms of performance as described in this JIRA:
> https://issues.apache.org/jira/browse/KAFKA-3101 <
> https://issues.apache.org/jira/browse/KAFKA-3101>)
> >>>
> >>> The key observation is that windows never close/complete. There could
> always be late arriving events that appear long after a window's end
> interval and those need to be accounted for properly. In Kafka Streams that
> means that such late arriving events continue to update the value of the
> window. As described in the above JIRA, some optimisations could still be
> possible (e.g., batch requests as described in KIP-63 <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
> however they are not implemented yet.
> >>>
> >>> So your code needs to handle each update.
> >>>
> >>> Thanks
> >>> Eno
> >>>
> >>>
> >>>
> >>>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID>
> wrote:
> >>>>
> >>>> Hi,
> >>>>  I would like to process a stream with a tumbling window of 5secs,
> create aggregated stats for keys and push the final aggregates at the end
> of each window period to a analytics backend. I have tried doing something
> like:
> >>>>  stream
> >>>>        .map
> >>>>        .reduceByKey(...
> >>>>          , TimeWindows.of("mywindow", 5000L),...)
> >>>>        .foreach        {            send stats
> >>>>          }
> >>>> But I get every update to the ktable in the foreach.
> >>>> How do I just get the final values once the TumblingWindow is
> complete so I can iterate over them and send to some external system?
> >>>> Thanks,
> >>>>  Clive
> >>>> PS Using kafka_2.10-0.10.0.0
> >>>>
> >>>
> >>>
> >>
> >>
> >
>
>
>
>

Reply via email to