Dear Flink Community! Some of you may have heard it already from announcements or from a Flink Forward talk: Alibaba has decided to open source its in-house improvements to Flink, called Blink! First of all, big thanks to team that developed these improvements and made this contribution possible!
Blink has some very exciting enhancements, most prominently on the Table API/SQL side and the unified execution of these programs. For batch (bounded) data, the SQL execution has full TPC-DS coverage (which is a big deal), and the execution is more than 10x faster than the current SQL runtime in Flink. Blink has also added support for catalogs, improved the failover speed of batch queries and the resource management. It also makes some good steps in the direction of more deeply unifying the batch and streaming execution. The proposal is to merge Blink's enhancements into Flink, to give Flink's SQL/Table API and execution a big boost in usability and performance. Just to avoid any confusion: This is not a suggested change of focus to batch processing, nor would this break with any of the streaming architecture and vision of Flink. This contribution follows very much the principle of "batch is a special case of streaming". As a special case, batch makes special optimizations possible. In its current state, Flink does not exploit many of these optimizations. This contribution adds exactly these optimizations and makes the streaming model of Flink applicable to harder batch use cases. Assuming that the community is excited about this as well, and in favor of these enhancements to Flink's capabilities, below are some thoughts on how this contribution and integration could work. --- Making the code available --- At the moment, the Blink code is in the form of a big Flink fork (rather than isolated patches on top of Flink), so the integration is unfortunately not as easy as merging a few patches or pull requests. To support a non-disruptive merge of such a big contribution, I believe it make sense to make the code of the fork available in the Flink project first. >From there on, we can start to work on the details for merging the enhancements, including the refactoring of the necessary parts in the Flink master and the Blink code to make a merge possible without repeatedly breaking compatibility. The first question is where do we put the code of the Blink fork during the merging procedure? My first thought was to temporarily add a repository (like "flink-blink-staging"), but we could also put it into a special branch in the main Flink repository. I will start a separate thread about discussing a possible strategy to handle and merge such a big contribution. Best, Stephan