Hi all,

I've been playing around with Apache Ignite and I want to run Flink on top
of it but there's something I'm not getting.

Ignite has its own support for clustering, and data is distributed on
different nodes using a partitioned key. Then, we are able to run a closure
and do some computation on the nodes that owns the data (collocation of
computation [1]), that way saving time and bandwidth. It all looks good,
but I'm not sure how it would play with Flink's own clustering capability.

My initial idea -which I haven't tried yet- is to use collocation to run a
closure where the data resides, and use that closure to execute a Flink
pipeline locally on that node (running it using a local environment), then
using a custom made data source I should be able to plug the data from the
local Ignite cache to the Flink pipeline and back into a cache using an
Ignite sink.

I'm not sure it's a good idea to disable Flink distribution and running it
in a local environment so the data is not transferred to another node. I
think it's the same problem with Kafka, if it partitions the data on
different nodes, how do you guarantee that Flink jobs are executed where
the data resides? In case there's no way to guarantee that unless you
enable local environment, what do you think of that approach (in terms of
performance)?

Any additional insight regarding stream processing on Ignite or any other
distributed storage is very welcome!

Best regards,
Matt

[1] https://apacheignite.readme.io/docs/collocate-compute-and-data

Reply via email to