Following on from this thread, if I want to iterate over a KTable at the end of 
its hopping/tumbling Time Window how can I do this at present myself? Is there 
a way to access these structures?
If this is not possible it would seem I need to duplicate and manage something 
similar to a list of windowed KTables myself which is not really ideal.
Thanks for any help,
 Clive
 

    On Monday, 13 June 2016, 16:03, Eno Thereska <eno.there...@gmail.com> wrote:
 

 Hi Clive,

For now this optimisation is not present. We're working on it as part of 
KIP-63. One manual work-around might be to use a simple Key-value store to 
deduplicate the final output before sending to the backend. It could have a 
simple policy like "output all values at 1 second intervals" or "output after 
10 records have been received".

Eno


> On 13 Jun 2016, at 13:36, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
> 
> 
> Thanks Eno for your comments and references.
> Perhaps, I can explain what I want to achieve and maybe you can suggest the 
> correct topology?
> I want process a stream of events and do aggregation and send to an analytics 
> backend (Influxdb), so that rather than sending 1000 points/sec to the 
> analytics backend, I send a much lower value. I'm only interested in using 
> the processing time of the event so in that respect there are no "late 
> arriving" events.I was hoping I could use a Tumbling window which when its 
> end-time had been passed I can send the consolidated aggregation for that 
> window and then throw the Window away. 
> 
> It sounds like from the references you give that this is not possible at 
> present in Kafka Streams?
> 
> Thanks,
> Clive 
> 
>    On Monday, 13 June 2016, 11:32, Eno Thereska <eno.there...@gmail.com> 
>wrote:
> 
> 
> Hi Clive,
> 
> The behaviour you are seeing is indeed correct (though not necessarily 
> optimal in terms of performance as described in this JIRA: 
> https://issues.apache.org/jira/browse/KAFKA-3101 
> <https://issues.apache.org/jira/browse/KAFKA-3101>)
> 
> The key observation is that windows never close/complete. There could always 
> be late arriving events that appear long after a window's end interval and 
> those need to be accounted for properly. In Kafka Streams that means that 
> such late arriving events continue to update the value of the window. As 
> described in the above JIRA, some optimisations could still be possible 
> (e.g., batch requests as described in KIP-63 
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-63:+Unify+store+and+downstream+caching+in+streams>),
>  however they are not implemented yet.
> 
> So your code needs to handle each update.
> 
> Thanks
> Eno
> 
> 
> 
>> On 13 Jun 2016, at 11:13, Clive Cox <clivej...@yahoo.co.uk.INVALID> wrote:
>> 
>> Hi,
>>  I would like to process a stream with a tumbling window of 5secs, create 
>>aggregated stats for keys and push the final aggregates at the end of each 
>>window period to a analytics backend. I have tried doing something like:
>>  stream
>>        .map
>>        .reduceByKey(...
>>          , TimeWindows.of("mywindow", 5000L),...)
>>        .foreach        {            send stats
>>          }
>> But I get every update to the ktable in the foreach.
>> How do I just get the final values once the TumblingWindow is complete so I 
>> can iterate over them and send to some external system?
>> Thanks,
>>  Clive
>> PS Using kafka_2.10-0.10.0.0
>> 
> 
> 


  

Reply via email to