Re: Best approach for recalculating statistics based on amended or deleted events?

Timo Walther Tue, 04 Feb 2020 07:27:56 -0800

Hi Stephan,

the use cases you are describing sound like a perfect fit to Flink.Internally, Flink deals with insertions and deletions that are flowingthrough the system and can update chained aggregations and complex queries.

The only bigger limitation at the moment is that we only support sourcesthat emit insert-only rows. The community is currently working ondesigning how we expose the internal changelog processing capabilitiesthrough our APIs.

However, your use case might also work with insert-only rows and a querybased on the flags in the data, correct?


Regards,
Timo


On 04.02.20 16:14, Stephen Young wrote:

I am currently looking into how Flink can support a live data collection 
platform. We want to collect certain data in real-time. This data will be sent 
to Kafka and we want to use Flink to calculate statistics and derived events 
from it.

An important thing we need to be able to handle is amendment or deletion 
events. For example, we may get an event that someone has performed an action 
and from this we'd calculate how many of these actions they had taken in total. 
We'd also build calculations on top of that, for example top 10 rankings by 
these counts, or arbitrarily many layers of calculations beyond that. But 
sometime later (this could be a few seconds or a week) we receive an amendment 
event to that action. This indicates that the action was taken by a different 
person or from a different location. We then need Flink to recalculate all of 
our downstream stats i.e. the counts need to be changed and rankings need to be 
adjusted.

From my research into Flink I can see there is a page about Dynamic Tables and 
also there was some stuff about retraction support for the Table/SQL API. But 
it seems like this is simply how Flink models changes to aggregated data. I 
would like to be able to do something like calculate a count from a set of 
events each with their own id, then retract one of those events by its id and 
have the count automatically change.


Is anything like this achievable with Flink? Thanks!

Re: Best approach for recalculating statistics based on amended or deleted events?

Reply via email to