Michael,

 > This allows Kafka Streams to retain old window buckets for a period of
time in order to wait for the late
arrival of records whose timestamps fall within the window interval. If a
record arrives
after the retention period has passed, the record cannot be processed and
is dropped.

So, how do we know if a window is being closed when until() is reached?
Do we get a callback of some sort when this happens? So that we can query
the final result for a window.

Otherwise, we have to keep reading the store say very n secs or m records.
These intermediate results may or may not be useful.
That's a lot of load but we still may miss the final computed value, which
is needed in most cases.

Srikanth

On Thu, Jul 14, 2016 at 5:04 AM, Michael Noll <mich...@confluent.io> wrote:

> Srikanth,
>
> > This would be useful in place where we use a key-value store just to
> > duplicate a KTable for get() operations.
> > Any rough idea when this is targeted for release?
>
> We are aiming to add the queryable state feature into the next release of
> Kafka.
>
>
> > Its still not clear how to use this for the case this thread was started
> for.
> > Does Kafka Stream keep windows alive forever?
> > At some point we need to "complete" a window rt?
>
> Kafka Streams keeps windows alive until the so-called window retention
> period expires.
>
> Excerpt from
>
> http://docs.confluent.io/3.0.0/streams/developer-guide.html#windowing-a-stream
> :
>
>     [For the DSL only]: A local state store is usually needed for a
> windowing operation
>     to store recently received records based on the window interval, while
> old records
>     in the store are purged after the specified window retention period.
> The retention time
>     can be set via `Windows#until()`.
>
> Excerpt from
>
> http://docs.confluent.io/3.0.0/streams/concepts.html#streams-concepts-windowing
> :
>
>     Windowing operations are available in the Kafka Streams DSL, where
> users can
>     specify a retention period for the window. This allows Kafka Streams to
> retain
>     old window buckets for a period of time in order to wait for the late
> arrival of records
>     whose timestamps fall within the window interval. If a record arrives
> after the retention
>     period has passed, the record cannot be processed and is dropped.
>
>
> > Either based on processing time or event time + watermark, etc.
>
> The time semantics are based on the timestamp extractor you have configured
> for your application.  The default timestamp extractor is
> `ConsumerRecordTimestampExtractor`, which yields event-time semantics.  If
> you want processing-time semantics, you need to configure your application
> to use the `WallclockTimestampExtractor`.
>
> Hope this helps,
> Michael
>
>
>
>
> On Wed, Jul 13, 2016 at 8:19 PM, Srikanth <srikanth...@gmail.com> wrote:
>
> > Thanks.
> >
> > This would be useful in place where we use a key-value store just to
> > duplicate a KTable for get() operations.
> > Any rough idea when this is targeted for release?
> >
> > Its still not clear how to use this for the case this thread was started
> > for.
> > Does Kafka Stream keep windows alive forever?
> > At some point we need to "complete" a window rt? Either based on
> processing
> > time or event time + watermark, etc.
> > How can we tie internal state store query with window completion? i.e,
> get
> > the final value.
> >
> > Srikanth
> >
> > On Thu, Jul 7, 2016 at 2:05 PM, Eno Thereska <eno.there...@gmail.com>
> > wrote:
> >
> > > Hi Srikanth, Clive,
> > >
> > > Today we just added some example code usage in the KIP after feedback
> > from
> > > the community. There is code that shows how to access a WindowStore (in
> > > read-only mode).
> > >
> > > Thanks
> > > Eno
> > >
> > >
> > > > On 7 Jul 2016, at 15:57, Srikanth <srikanth...@gmail.com> wrote:
> > > >
> > > > Eno,
> > > >
> > > > I was also looking for something similar. To output aggregate value
> > once
> > > > the window is "complete".
> > > > I'm not sure getting individual update for an aggregate operator is
> > that
> > > > useful.
> > > >
> > > > With KIP-67, will we have access to Windowed[key]( key + timestamp)
> and
> > > > value?
> > > > Does until() clear this store when time passes?
> > > >
> > > > Srikanth
> > > >
> > > > On Thu, Jun 30, 2016 at 4:27 AM, Clive Cox
> > <clivej...@yahoo.co.uk.invalid
> > > >
> > > > wrote:
> > > >
> > > >> Hi Eno,
> > > >> I've looked at KIP-67. It looks good but its not clear what calls I
> > > would
> > > >> make to do what I presently need: Get access to each windowed store
> at
> > > some
> > > >> time soon after window end time. I can then use the methods
> specified
> > to
> > > >> iterate over keys and values. Can you point me to the relevant
> > > >> method/technique for this?
> > > >>
> > > >> Thanks,
> > > >> Clive
> > > >>
> > > >>
> > > >>    On Tuesday, 28 June 2016, 12:47, Eno Thereska <
> > > eno.there...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>
> > > >> Hi Clive,
> > > >>
> > > >> As promised, here is the link to the KIP that just went out today.
> > > >> Feedback welcome:
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> > > >> <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67:+Queryable+state+for+Kafka+Streams
> > > >>>
> > > >>
> > > >> Thanks
> > > >> Eno
> > > >>
> > > >>> On 27 Jun 2016, at 20:56, Eno Thereska <eno.there...@gmail.com>
> > wrote:
> > > >>>
> > > >>> Hi Clive,
> > > >>>
> > > >>> We are working on exposing the state store behind a KTable as part
> of
> > > >> allowing for queries to the structures currently hidden behind the
> > > language
> > > >> (DSL). The KIP should be out today or tomorrow for you to have a
> look.
> > > You
> > > >> can probably do what you need using the low-level processor API but
> > then
> > > >> you'd lose the benefits of the DSL and would have to maintain your
> own
> > > >> structures.
> > > >>>
> > > >>> Thanks,
> > > >>> Eno
> > > >>>
> > > >>>> On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID
> >
> > > >> wrote:
> > > >>>>
> > > >>>> Following on from this thread, if I want to iterate over a KTable
> at
> > > >> the end of its hopping/tumbling Time Window how can I do this at
> > present
> > > >> myself? Is there a way to access these structures?
> > > >>>> If this is not possible it would seem I need to duplicate and
> manage
> > > >> something similar to a list of windowed KTables myself which is not
> > > really
> > > >> ideal.
> > > >>>> Thanks for any help,
> > > >>>> Clive
> > > >>>>
> > > >>>>
> > > >>>> On Monday, 13 June 2016, 16:03, Eno Thereska <
> > eno.there...@gmail.com>
> > > >> wrote:
> > > >>>>
> > > >>>>
> > > >>>> Hi Clive,
> > > >>>>
> > > >>>> For now this optimisation is not present. We're working on it as
> > part
> > > >> of KIP-63. One manual work-around might be to use a simple Key-value
> > > store
> > > >> to deduplicate the final output before sending to the backend. It
> > could
> > > >> have a simple policy like "output all values at 1 second intervals"
> or
> > > >> "output after 10 records have been received".
> > > >>>>
> > > >>>> Eno
> > > >>>>
> > > >>>>
> > > >>>>> On 13 Jun 2016, at 13:36, Clive Cox
> <clivej...@yahoo.co.uk.INVALID
> > >
> > > >> wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks Eno for your comments and references.
> > > >>>>> Perhaps, I can explain what I want to achieve and maybe you can
> > > >> suggest the correct topology?
> > > >>>>> I want process a stream of events and do aggregation and send to
> an
> > > >> analytics backend (Influxdb), so that rather than sending 1000
> > > points/sec
> > > >> to the analytics backend, I send a much lower value. I'm only
> > > interested in
> > > >> using the processing time of the event so in that respect there are
> no
> > > >> "late arriving" events.I was hoping I could use a Tumbling window
> > which
> > > >> when its end-time had been passed I can send the consolidated
> > > aggregation
> > > >> for that window and then throw the Window away.
> > > >>>>>
> > > >>>>> It sounds like from the references you give that this is not
> > possible
> > > >> at present in Kafka Streams?
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Clive
> > > >>>>>
> > > >>>>>   On Monday, 13 June 2016, 11:32, Eno Thereska <
> > > >> eno.there...@gmail.com> wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>> Hi Clive,
> > > >>>>>
> > > >>>>> The behaviour you are seeing is indeed correct (though not
> > > necessarily
> > > >> optimal in terms of performance as described in this JIRA:
> > > >> https://issues.apache.org/jira/browse/KAFKA-3101 <
> > > >> https://issues.apache.org/jira/browse/KAFKA-3101>)
> > > >>>>>
> > > >>>>> The key observation is that windows never close/complete. There
> > could
> > > >> always be late arriving events that appear long after a window's end
> > > >> interval and those need to be accounted for properly. In Kafka
> Streams
> > > that
> > > >> means that such late arriving events continue to update the value of
> > the
> > > >> window. As described in the above JIRA, some optimisations could
> still
> > > be
> > > >> possible (e.g., batch requests as described in KIP-63 <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams
> > > >),
> > > >> however they are not implemented yet.
> > > >>>>>
> > > >>>>> So your code needs to handle each update.
> > > >>>>>
> > > >>>>> Thanks
> > > >>>>> Eno
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> On 13 Jun 2016, at 11:13, Clive Cox
> <clivej...@yahoo.co.uk.INVALID
> > >
> > > >> wrote:
> > > >>>>>>
> > > >>>>>> Hi,
> > > >>>>>> I would like to process a stream with a tumbling window of
> 5secs,
> > > >> create aggregated stats for keys and push the final aggregates at
> the
> > > end
> > > >> of each window period to a analytics backend. I have tried doing
> > > something
> > > >> like:
> > > >>>>>> stream
> > > >>>>>>       .map
> > > >>>>>>       .reduceByKey(...
> > > >>>>>>         , TimeWindows.of("mywindow", 5000L),...)
> > > >>>>>>       .foreach        {            send stats
> > > >>>>>>         }
> > > >>>>>> But I get every update to the ktable in the foreach.
> > > >>>>>> How do I just get the final values once the TumblingWindow is
> > > >> complete so I can iterate over them and send to some external
> system?
> > > >>>>>> Thanks,
> > > >>>>>> Clive
> > > >>>>>> PS Using kafka_2.10-0.10.0.0
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
>
> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
> Apache Kafka and Confluent Platform: www.confluent.io/download
> <http://www.confluent.io/download>*
>

Reply via email to