Hi all, I've been playing around with Apache Ignite and I want to run Flink on top of it but there's something I'm not getting.
Ignite has its own support for clustering, and data is distributed on different nodes using a partitioned key. Then, we are able to run a closure and do some computation on the nodes that owns the data (collocation of computation [1]), that way saving time and bandwidth. It all looks good, but I'm not sure how it would play with Flink's own clustering capability. My initial idea -which I haven't tried yet- is to use collocation to run a closure where the data resides, and use that closure to execute a Flink pipeline locally on that node (running it using a local environment), then using a custom made data source I should be able to plug the data from the local Ignite cache to the Flink pipeline and back into a cache using an Ignite sink. I'm not sure it's a good idea to disable Flink distribution and running it in a local environment so the data is not transferred to another node. I think it's the same problem with Kafka, if it partitions the data on different nodes, how do you guarantee that Flink jobs are executed where the data resides? In case there's no way to guarantee that unless you enable local environment, what do you think of that approach (in terms of performance)? Any additional insight regarding stream processing on Ignite or any other distributed storage is very welcome! Best regards, Matt [1] https://apacheignite.readme.io/docs/collocate-compute-and-data