Hi Eno,
I've looked at KIP-67. It looks good but its not clear what calls I would make 
to do what I presently need: Get access to each windowed store at some time 
soon after window end time. I can then use the methods specified to iterate 
over keys and values. Can you point me to the relevant method/technique for 
this?

Thanks,
Clive
 

    On Tuesday, 28 June 2016, 12:47, Eno Thereska <eno.there...@gmail.com> 
wrote:
 

 Hi Clive,

As promised, here is the link to the KIP that just went out today. Feedback 
welcome:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
 
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-67:+Queryable+state+for+Kafka+Streams>

Thanks
Eno

> On 27 Jun 2016, at 20:56, Eno Thereska <eno.there...@gmail.com> wrote:
> 
> Hi Clive,
> 
> We are working on exposing the state store behind a KTable as part of 
> allowing for queries to the structures currently hidden behind the language 
> (DSL). The KIP should be out today or tomorrow for you to have a look. You 
> can probably do what you need using the low-level processor API but then 
> you'd lose the benefits of the DSL and would have to maintain your own 
> structures.
> 
> Thanks,
> Eno
> 
>> On 26 Jun 2016, at 18:42, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>> 
>> Following on from this thread, if I want to iterate over a KTable at the end 
>> of its hopping/tumbling Time Window how can I do this at present myself? Is 
>> there a way to access these structures?
>> If this is not possible it would seem I need to duplicate and manage 
>> something similar to a list of windowed KTables myself which is not really 
>> ideal.
>> Thanks for any help,
>> Clive
>> 
>> 
>>  On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> wrote:
>> 
>> 
>> Hi Clive,
>> 
>> For now this optimisation is not present. We're working on it as part of 
>> KIP-63. One manual work-around might be to use a simple Key-value store to 
>> deduplicate the final output before sending to the backend. It could have a 
>> simple policy like "output all values at 1 second intervals" or "output 
>> after 10 records have been received".
>> 
>> Eno
>> 
>> 
>>> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>>> 
>>> 
>>> Thanks Eno for your comments and references.
>>> Perhaps, I can explain what I want to achieve and maybe you can suggest the 
>>> correct topology?
>>> I want process a stream of events and do aggregation and send to an 
>>> analytics backend (Influxdb), so that rather than sending 1000 points/sec 
>>> to the analytics backend, I send a much lower value. I'm only interested in 
>>> using the processing time of the event so in that respect there are no 
>>> "late arriving" events.I was hoping I could use a Tumbling window which 
>>> when its end-time had been passed I can send the consolidated aggregation 
>>> for that window and then throw the Window away. 
>>> 
>>> It sounds like from the references you give that this is not possible at 
>>> present in Kafka Streams?
>>> 
>>> Thanks,
>>> Clive 
>>> 
>>>    On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> 
>>>wrote:
>>> 
>>> 
>>> Hi Clive,
>>> 
>>> The behaviour you are seeing is indeed correct (though not necessarily 
>>> optimal in terms of performance as described in this JIRA: 
>>> https://issues.apache.org/jira/browse/KAFKA-3101 
>>> <https://issues.apache.org/jira/browse/KAFKA-3101>)
>>> 
>>> The key observation is that windows never close/complete. There could 
>>> always be late arriving events that appear long after a window's end 
>>> interval and those need to be accounted for properly. In Kafka Streams that 
>>> means that such late arriving events continue to update the value of the 
>>> window. As described in the above JIRA, some optimisations could still be 
>>> possible (e.g., batch requests as described in KIP-63 
>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
>>>  however they are not implemented yet.
>>> 
>>> So your code needs to handle each update.
>>> 
>>> Thanks
>>> Eno
>>> 
>>> 
>>> 
>>>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>>>> 
>>>> Hi,
>>>>  I would like to process a stream with a tumbling window of 5secs, create 
>>>>aggregated stats for keys and push the final aggregates at the end of each 
>>>>window period to a analytics backend. I have tried doing something like:
>>>>  stream
>>>>        .map
>>>>        .reduceByKey(...
>>>>          , TimeWindows.of("mywindow", 5000L),...)
>>>>        .foreach        {            send stats
>>>>          }
>>>> But I get every update to the ktable in the foreach.
>>>> How do I just get the final values once the TumblingWindow is complete so 
>>>> I can iterate over them and send to some external system?
>>>> Thanks,
>>>>  Clive
>>>> PS Using kafka_2.10-0.10.0.0
>>>> 
>>> 
>>> 
>> 
>> 
> 


  

Reply via email to