Hey Stephan and others, thanks for the summary. I'm very excited about the outlined improvements. :-)
Separate branch vs. fork: I'm fine with either of the suggestions. Depending on the expected strategy for merging the changes, expected number of additional changes, etc., either one or the other approach might be better suited. – Ufuk On Tue, Jan 22, 2019 at 9:20 AM Kurt Young <ykt...@gmail.com> wrote: > > Hi Driesprong, > > Glad to hear that you're interested with blink's codes. Actually, blink > only has one branch by itself, so either a separated repo or a flink's > branch works for blink's code share. > > Best, > Kurt > > > On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko <fo...@driesprong.frl> > wrote: > > > Great news Stephan! > > > > Why not make the code available by having a fork of Flink on Alibaba's > > Github account. This will allow us to do easy diff's in the Github UI and > > create PR's of cherry-picked commits if needed. I can imagine that the > > Blink codebase has a lot of branches by itself, so just pushing a couple of > > branches to the main Flink repo is not ideal. Looking forward to it! > > > > Cheers, Fokko > > > > > > > > > > > > Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <wshaox...@gmail.com>: > > > > > big +1 to contribute Blink codebase directly into the Apache Flink > > project. > > > Looking forward to the new journey. > > > > > > Regards, > > > Shaoxuan > > > > > > On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <xiaow...@gmail.com> > > wrote: > > > > > > > Thanks Stephan! We are hoping to make the process as non-disruptive as > > > > possible to the Flink community. Making the Blink codebase public is > > the > > > > first step that hopefully facilitates further discussions. > > > > Xiaowei > > > > > > > > On Monday, January 21, 2019, 11:46:28 AM PST, Stephan Ewen < > > > > se...@apache.org> wrote: > > > > > > > > Dear Flink Community! > > > > > > > > Some of you may have heard it already from announcements or from a > > Flink > > > > Forward talk: > > > > Alibaba has decided to open source its in-house improvements to Flink, > > > > called Blink! > > > > First of all, big thanks to team that developed these improvements and > > > made > > > > this > > > > contribution possible! > > > > > > > > Blink has some very exciting enhancements, most prominently on the > > Table > > > > API/SQL side > > > > and the unified execution of these programs. For batch (bounded) data, > > > the > > > > SQL execution > > > > has full TPC-DS coverage (which is a big deal), and the execution is > > more > > > > than 10x faster > > > > than the current SQL runtime in Flink. Blink has also added support for > > > > catalogs, > > > > improved the failover speed of batch queries and the resource > > management. > > > > It also > > > > makes some good steps in the direction of more deeply unifying the > > batch > > > > and streaming > > > > execution. > > > > > > > > The proposal is to merge Blink's enhancements into Flink, to give > > Flink's > > > > SQL/Table API and > > > > execution a big boost in usability and performance. > > > > > > > > Just to avoid any confusion: This is not a suggested change of focus to > > > > batch processing, > > > > nor would this break with any of the streaming architecture and vision > > of > > > > Flink. > > > > This contribution follows very much the principle of "batch is a > > special > > > > case of streaming". > > > > As a special case, batch makes special optimizations possible. In its > > > > current state, > > > > Flink does not exploit many of these optimizations. This contribution > > > adds > > > > exactly these > > > > optimizations and makes the streaming model of Flink applicable to > > harder > > > > batch use cases. > > > > > > > > Assuming that the community is excited about this as well, and in favor > > > of > > > > these enhancements > > > > to Flink's capabilities, below are some thoughts on how this > > contribution > > > > and integration > > > > could work. > > > > > > > > --- Making the code available --- > > > > > > > > At the moment, the Blink code is in the form of a big Flink fork > > (rather > > > > than isolated > > > > patches on top of Flink), so the integration is unfortunately not as > > easy > > > > as merging a > > > > few patches or pull requests. > > > > > > > > To support a non-disruptive merge of such a big contribution, I believe > > > it > > > > make sense to make > > > > the code of the fork available in the Flink project first. > > > > From there on, we can start to work on the details for merging the > > > > enhancements, including > > > > the refactoring of the necessary parts in the Flink master and the > > Blink > > > > code to make a > > > > merge possible without repeatedly breaking compatibility. > > > > > > > > The first question is where do we put the code of the Blink fork during > > > the > > > > merging procedure? > > > > My first thought was to temporarily add a repository (like > > > > "flink-blink-staging"), but we could > > > > also put it into a special branch in the main Flink repository. > > > > > > > > > > > > I will start a separate thread about discussing a possible strategy to > > > > handle and merge > > > > such a big contribution. > > > > > > > > Best, > > > > Stephan > > > > > > > > >