Apache Flink <=> Apache Ignite integration

Raul Kripalani Wed, 30 Mar 2016 04:48:39 -0700

Hello from the Apache Ignite community!

Last year there was an interesting thread [1] about such integration.
Unfortunately there's been little follow-through, so let's try and fix that
in 2016 ;-)


I'm sure a lot has changed in the Flink community, with the recent
graduation and 1.0 release, so I'd like to make a new (updated) list of
synergies and areas of integration I can think of:

+++ *Ignite as a bidirectional Connector* +++

The first and obvious integration point is Ignite as a source and a sink of
Flink. An Ignite contributor has already sent a pull request [2] to serve
as a sink into Ignite Queues, but I feel this integration can be deeper and
more functional. Moreover, it should be hosted in the Flink source tree as
a Connector (like the Kafka, or ES connectors). Particularly, we could
offer these features:

* As a Flink sink => inject data directly into a cache via a DataStreamer.
* As a Flink source => run a continuous query against one or multiple
caches [4].

+++ *Ignite as a state backend* +++

Either natively [5] or via the IGFS (Ignite Filesystem) interface which can
run as a Hadoop Filesystem [6].

This would allow Flink to store intermediate states in Ignite. I believe
this is what you called "distributed backup for Streaming Operator State"
in the initial exchange, is it?

+++ *Ignite as a DataSet API connector* +++

Ability to use Ignite as a source for batch pipelines, by executing Ignite
SQL queries [7] against a cache and feeding the results into a Flink
pipeline. Basically a batch counterpart to the streaming continuous query
idea above.

+++ *Ignite as an execution backend* +++

You already mentioned this in [1] and I think it makes for a perfect
synergy between both projects, through Ignite's Compute API.

Still agree with this? Any changes since last year I should take into
account?

+++ *Ignite as a parameter server* +++

This was in the initial proposal [1], but it's not clear to me. I have
found references to the idea of a Parameter Server in Flink, but only as
proposed ideas. Was this feature finally implemented, or is it in the
future roadmap?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is just a newer, updated proposal from my side, but I'm sure that both
communities can, and will want to, chime in!

Cheers,

[1]
https://mail-archives.apache.org/mod_mbox/flink-dev/201504.mbox/%3CCANC1h_u__KgsdOo2SZ4M=8jf3zomozs3xbekq0erjj9p4wf...@mail.gmail.com%3E
[2] https://issues.apache.org/jira/browse/IGNITE-813
[3] https://ignite.apache.org/features/streaming.html
[4] http://apacheignite.gridgain.org/v1.5/docs/continuous-queries
[5] https://apacheignite-fs.readme.io/docs/igfs
[6] https://apacheignite-fs.readme.io/docs/file-system
[7] https://apacheignite.readme.io/docs/sql-queries

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
Blog: raul.io | twitter: @raulvk <https://twitter.com/raulvk>

Apache Flink <=> Apache Ignite integration

Reply via email to