Hi Clive,

We are working on exposing the state store behind a KTable as part of allowing 
for queries to the structures currently hidden behind the language (DSL). The 
KIP should be out today or tomorrow for you to have a look. You can probably do 
what you need using the low-level processor API but then you'd lose the 
benefits of the DSL and would have to maintain your own structures.

Thanks,
Eno

> On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
> 
> Following on from this thread, if I want to iterate over a KTable at the end 
> of its hopping/tumbling Time Window how can I do this at present myself? Is 
> there a way to access these structures?
> If this is not possible it would seem I need to duplicate and manage 
> something similar to a list of windowed KTables myself which is not really 
> ideal.
> Thanks for any help,
>  Clive
> 
> 
>    On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> 
> wrote:
> 
> 
> Hi Clive,
> 
> For now this optimisation is not present. We're working on it as part of 
> KIP-63. One manual work-around might be to use a simple Key-value store to 
> deduplicate the final output before sending to the backend. It could have a 
> simple policy like "output all values at 1 second intervals" or "output after 
> 10 records have been received".
> 
> Eno
> 
> 
>> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>> 
>> 
>> Thanks Eno for your comments and references.
>> Perhaps, I can explain what I want to achieve and maybe you can suggest the 
>> correct topology?
>> I want process a stream of events and do aggregation and send to an 
>> analytics backend (Influxdb), so that rather than sending 1000 points/sec to 
>> the analytics backend, I send a much lower value. I'm only interested in 
>> using the processing time of the event so in that respect there are no "late 
>> arriving" events.I was hoping I could use a Tumbling window which when its 
>> end-time had been passed I can send the consolidated aggregation for that 
>> window and then throw the Window away. 
>> 
>> It sounds like from the references you give that this is not possible at 
>> present in Kafka Streams?
>> 
>> Thanks,
>> Clive 
>> 
>>     On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> 
>> wrote:
>> 
>> 
>> Hi Clive,
>> 
>> The behaviour you are seeing is indeed correct (though not necessarily 
>> optimal in terms of performance as described in this JIRA: 
>> https://issues.apache.org/jira/browse/KAFKA-3101 
>> <https://issues.apache.org/jira/browse/KAFKA-3101>)
>> 
>> The key observation is that windows never close/complete. There could always 
>> be late arriving events that appear long after a window's end interval and 
>> those need to be accounted for properly. In Kafka Streams that means that 
>> such late arriving events continue to update the value of the window. As 
>> described in the above JIRA, some optimisations could still be possible 
>> (e.g., batch requests as described in KIP-63 
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
>>  however they are not implemented yet.
>> 
>> So your code needs to handle each update.
>> 
>> Thanks
>> Eno
>> 
>> 
>> 
>>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>>> 
>>> Hi,
>>>   I would like to process a stream with a tumbling window of 5secs, create 
>>> aggregated stats for keys and push the final aggregates at the end of each 
>>> window period to a analytics backend. I have tried doing something like:
>>>   stream
>>>         .map
>>>         .reduceByKey(...
>>>           , TimeWindows.of("mywindow", 5000L),...)
>>>         .foreach        {            send stats
>>>           }
>>> But I get every update to the ktable in the foreach.
>>> How do I just get the final values once the TumblingWindow is complete so I 
>>> can iterate over them and send to some external system?
>>> Thanks,
>>>   Clive
>>> PS Using kafka_2.10-0.10.0.0
>>> 
>> 
>> 
> 
> 

Reply via email to