Re: Possible use case: Simulating iterative batch processing by rewinding source

Christophe Salperwyck Wed, 06 Apr 2016 13:01:33 -0700

Hi,

I am interested too. For my part, I was thinking to use HBase as a backend
so that my data are stored sorted. Nice to have to generate timeseries in
the good order.


Cheers,
Christophe

2016-04-06 21:22 GMT+02:00 Raul Kripalani <ra...@apache.org>:

> Hello,
>
> I'm getting started with Flink for a use case that could leverage the
> window processing abilities of Flink that Spark does not offer.
>
> Basically I have dumps of timeseries data (10y in ticks) which I need to
> calculate many metrics in an exploratory manner based on event time. NOTE:
> I don't have the metrics beforehand, it's gonna be an exploratory and
> iterative data analytics effort.
>
> Flink doesn't seem to support windows on batch processing, so I'm thinking
> about emulating batch by using the Kafka stream connector and rewinding the
> data stream for every new metric that I calculate, to process the full
> timeseries series in a batch.
>
> Each metric I calculate should in turn be sent to another Kafka topic so I
> can use it in a subsequent processing batch, e.g.
>
> Iteration 1)   raw timeseries data ---> metric1
> Iteration 2)   raw timeseries data + metric1 (composite) ---> metric2
> Iteration 3)   metric1 + metric2 ---> metric3
> Iteration 4)   raw timeseries data + metric3 ---> metric4
> ...
>
> Does this sound like a usecase for Flink? Could you guide me a little bit
> on whether this is feasible currently?
>
> Cheers,
>
> *Raúl Kripalani*
> PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
> Messaging Engineer
> http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
> Blog: raul.io
> <http://raul.io/?utm_source=email&utm_medium=email&utm_campaign=apache> |
> twitter: @raulvk <https://twitter.com/raulvk>
>

Re: Possible use case: Simulating iterative batch processing by rewinding source

Reply via email to