Re: [DISCUSS] A strategy for merging the Blink enhancements

Stephan Ewen Wed, 23 Jan 2019 06:28:42 -0800

I think that is a reasonable proposal. Bugs that are identified could be
fixed in the blink branch, so that we merge the working code.


New feature contributions to that branch would complicate the merge. I
would try and rather focus on merging and let new contributions go to the
master branch.

On Tue, Jan 22, 2019 at 11:12 PM Zhang, Xuefu <xuef...@alibaba-inc.com>
wrote:

> Hi Stephan,
>
> Thanks for bringing up the discussions. I'm +1 on the merging plan. One
> question though: since the merge will not be completed for some time and
> there are might be uses trying blink branch, what's the plan for the
> development in the branch? Personally I think we may discourage big
> contributions to the branch, which would further complicate the merge,
> while we shouldn't stop critical fixes as well.
>
> What's your take on this?
>
> Thanks,
> Xuefu
>
>
> ------------------------------------------------------------------
> From:Stephan Ewen <se...@apache.org>
> Sent At:2019 Jan. 22 (Tue.) 06:16
> To:dev <dev@flink.apache.org>
> Subject:[DISCUSS] A strategy for merging the Blink enhancements
>
> Dear Flink community!
>
> As a follow-up to the thread announcing Alibaba's offer to contribute the
> Blink code [1]
> <
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> >
> ,
> here are some thoughts on how this contribution could be merged.
>
> As described in the announcement thread, it is a big contribution, and we
> need to
> carefully plan how to handle the contribution. We would like to get the
> improvements to Flink,
> while making it as non-disruptive as possible for the community.
> I hope that this plan gives the community get a better understanding of
> what the
> proposed contribution would mean.
>
> Here is an initial rough proposal, with thoughts from
> Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
> Xiaowei:
>
>   - It is obviously very hard to merge all changes in a quick move, because
> we
>     are talking about multiple 100k lines of code.
>
>   - As much as possible, we want to maintain compatibility with the current
> Table API,
>     so that this becomes a transparent change for most users.
>
>   - The two areas with the most changes we identified were
>      (1) The SQL/Table query processor
>      (2) The batch scheduling/failover/shuffle
>
>   - For the query processor part, this is what we found and propose:
>
>     -> The Blink and Flink code have the same semantics (ANSI SQL) except
> for minor
>        aspects (under discussion). Blink also covers more SQL operations.
>
>     -> The Blink code is quite different from the current Flink SQL
> runtime.
>        Merging as changes seems hardly feasible. From the current
> evaluation, the
>        Blink query processor uses the more advanced architecture, so it
> would make
>        sense to converge to that design.
>
>     -> We propose to gradually build up the Blink-based query processor as
> a second
>        query processor under the SQL/Table API. Think of it as two
> different runners
>        for the Table API.
>        As the new query processor becomes fully merged and stable, we can
> deprecate and
>        eventually remove the existing query processor. That should give the
> least
>        disruption to Flink users and allow for gradual merge/development.
>
>     -> Some refactoring of the Table API is necessary to support the above
> strategy.
>        Most of the prerequisite refactoring is around splitting the project
> into
>        different modules, following a similar idea as FLIP-28 [2]
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> >
> .
>
>     -> A more detailed proposal is being worked on.
>
>     -> Same as FLIP-28, this approach would probably need to suspend Table
> API
>        contributions for a short while. We hope that this can be a very
> short period,
>        to not impact the very active development in Flink on Table API/SQL
> too much.
>
>   - For the batch scheduling and failover enhancements, we should be able
> to build
>     on the currently ongoing refactoring of the scheduling logic [3]
> <https://issues.apache.org/jira/browse/FLINK-10429>. That should
>     make it easy to plug in a new scheduler and failover logic. We can port
> the Blink
>     enhancements as a new scheduler / failover handler. We can later make
> it the
>     default for bounded stream programs once the merge is completed and it
> is tested.
>
>   - For the catalog and source/sink design and interfaces, we would like to
>     continue with the already started design discussion threads. Once these
> are
>     converged, we might use some of the Blink code for the implementation,
> if it
>     is close to the outcome of the design discussions.
>
> Best,
> Stephan
>
> [1]
>
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
>
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
>
> [3] https://issues.apache.org/jira/browse/FLINK-10429
>

Re: [DISCUSS] A strategy for merging the Blink enhancements

Reply via email to