I think that is a reasonable proposal. Bugs that are identified could be fixed in the blink branch, so that we merge the working code.
New feature contributions to that branch would complicate the merge. I would try and rather focus on merging and let new contributions go to the master branch. On Tue, Jan 22, 2019 at 11:12 PM Zhang, Xuefu <xuef...@alibaba-inc.com> wrote: > Hi Stephan, > > Thanks for bringing up the discussions. I'm +1 on the merging plan. One > question though: since the merge will not be completed for some time and > there are might be uses trying blink branch, what's the plan for the > development in the branch? Personally I think we may discourage big > contributions to the branch, which would further complicate the merge, > while we shouldn't stop critical fixes as well. > > What's your take on this? > > Thanks, > Xuefu > > > ------------------------------------------------------------------ > From:Stephan Ewen <se...@apache.org> > Sent At:2019 Jan. 22 (Tue.) 06:16 > To:dev <dev@flink.apache.org> > Subject:[DISCUSS] A strategy for merging the Blink enhancements > > Dear Flink community! > > As a follow-up to the thread announcing Alibaba's offer to contribute the > Blink code [1] > < > https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E > > > , > here are some thoughts on how this contribution could be merged. > > As described in the announcement thread, it is a big contribution, and we > need to > carefully plan how to handle the contribution. We would like to get the > improvements to Flink, > while making it as non-disruptive as possible for the community. > I hope that this plan gives the community get a better understanding of > what the > proposed contribution would mean. > > Here is an initial rough proposal, with thoughts from > Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian, > Xiaowei: > > - It is obviously very hard to merge all changes in a quick move, because > we > are talking about multiple 100k lines of code. > > - As much as possible, we want to maintain compatibility with the current > Table API, > so that this becomes a transparent change for most users. > > - The two areas with the most changes we identified were > (1) The SQL/Table query processor > (2) The batch scheduling/failover/shuffle > > - For the query processor part, this is what we found and propose: > > -> The Blink and Flink code have the same semantics (ANSI SQL) except > for minor > aspects (under discussion). Blink also covers more SQL operations. > > -> The Blink code is quite different from the current Flink SQL > runtime. > Merging as changes seems hardly feasible. From the current > evaluation, the > Blink query processor uses the more advanced architecture, so it > would make > sense to converge to that design. > > -> We propose to gradually build up the Blink-based query processor as > a second > query processor under the SQL/Table API. Think of it as two > different runners > for the Table API. > As the new query processor becomes fully merged and stable, we can > deprecate and > eventually remove the existing query processor. That should give the > least > disruption to Flink users and allow for gradual merge/development. > > -> Some refactoring of the Table API is necessary to support the above > strategy. > Most of the prerequisite refactoring is around splitting the project > into > different modules, following a similar idea as FLIP-28 [2] > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free > > > . > > -> A more detailed proposal is being worked on. > > -> Same as FLIP-28, this approach would probably need to suspend Table > API > contributions for a short while. We hope that this can be a very > short period, > to not impact the very active development in Flink on Table API/SQL > too much. > > - For the batch scheduling and failover enhancements, we should be able > to build > on the currently ongoing refactoring of the scheduling logic [3] > <https://issues.apache.org/jira/browse/FLINK-10429>. That should > make it easy to plug in a new scheduler and failover logic. We can port > the Blink > enhancements as a new scheduler / failover handler. We can later make > it the > default for bounded stream programs once the merge is completed and it > is tested. > > - For the catalog and source/sink design and interfaces, we would like to > continue with the already started design discussion threads. Once these > are > converged, we might use some of the Blink code for the implementation, > if it > is close to the outcome of the design discussions. > > Best, > Stephan > > [1] > > https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E > > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free > > [3] https://issues.apache.org/jira/browse/FLINK-10429 >