Possible use case: Simulating iterative batch processing by rewinding source

Raul Kripalani Wed, 06 Apr 2016 12:22:34 -0700

Hello,

I'm getting started with Flink for a use case that could leverage the
window processing abilities of Flink that Spark does not offer.


Basically I have dumps of timeseries data (10y in ticks) which I need to
calculate many metrics in an exploratory manner based on event time. NOTE:
I don't have the metrics beforehand, it's gonna be an exploratory and
iterative data analytics effort.

Flink doesn't seem to support windows on batch processing, so I'm thinking
about emulating batch by using the Kafka stream connector and rewinding the
data stream for every new metric that I calculate, to process the full
timeseries series in a batch.

Each metric I calculate should in turn be sent to another Kafka topic so I
can use it in a subsequent processing batch, e.g.

Iteration 1)   raw timeseries data ---> metric1
Iteration 2)   raw timeseries data + metric1 (composite) ---> metric2
Iteration 3)   metric1 + metric2 ---> metric3
Iteration 4)   raw timeseries data + metric3 ---> metric4
...

Does this sound like a usecase for Flink? Could you guide me a little bit
on whether this is feasible currently?

Cheers,

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
Blog: raul.io
<http://raul.io/?utm_source=email&utm_medium=email&utm_campaign=apache> |
twitter: @raulvk <https://twitter.com/raulvk>

Possible use case: Simulating iterative batch processing by rewinding source

Reply via email to