[DISCUSS] Features for Apache Flink 1.10

Gary Yao Fri, 06 Sep 2019 08:07:14 -0700

Hi community,

Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
start kicking off the discussion about what we want to achieve for the 1.10
release.


Based on discussions with various people as well as observations from
mailing
list threads, Yu Li and I have compiled a list of features that we deem
important to be included in the next release. Note that the features
presented
here are not meant to be exhaustive. As always, I am sure that there will be
other contributions that will make it into the next release. This email
thread
is merely to kick off a discussion, and to give users and contributors an
understanding where the focus of the next release lies. If there is anything
we have missed that somebody is working on, please reply to this thread.


** Proposed features and focus

Following the contribution of Blink to Apache Flink, the community released
a
preview of the Blink SQL Query Processor, which offers better SQL coverage
and
improved performance for batch queries, in Flink 1.9.0. However, the
integration of the Blink query processor is not fully completed yet as there
are still pending tasks, such as implementing full TPC-DS support. With the
next Flink release, we aim at finishing the Blink integration.

Furthermore, there are several ongoing work threads addressing long-standing
issues reported by users, such as improving checkpointing under
backpressure,
and limiting RocksDBs native memory usage, which can be especially
problematic
in containerized Flink deployments.

Notable features surrounding Flink’s ecosystem that are planned for the next
release include active Kubernetes support (i.e., enabling Flink’s
ResourceManager to launch new pods), improved Hive integration, Java 11
support, and new algorithms for the Flink ML library.

Below I have included the list of features that we compiled ordered by
priority – some of which already have ongoing mailing list threads, JIRAs,
or
FLIPs.

- Improving Flink’s build system & CI [1] [2]
- Support Java 11 [3]
- Table API improvements
    - Configuration Evolution [4] [5]
    - Finish type system: Expression Re-design [6] and UDF refactor
    - Streaming DDL: Time attribute (watermark) and Changelog support
    - Full SQL partition support for both batch & streaming [7]
    - New Java Expression DSL [8]
    - SQL CLI with DDL and DML support
- Hive compatibility completion (DDL/UDF) to support full Hive integration
    - Partition/Function/View support
- Remaining Blink planner/runtime merge
    - Support all TPC-DS queries [9]
- Finer grained resource management
    - Unified TaskExecutor Memory Configuration [10]
    - Fine Grained Operator Resource Management [11]
    - Dynamic Slots Allocation [12]
- Finish scheduler re-architecture [13]
    - Allows implementing more sophisticated scheduling strategies such as
better batch scheduler or speculative execution.
- New DataStream Source Interface [14]
    - A new source connector architecture to unify the implementation of
source connectors and make it simpler to implement custom source connectors.
- Add more source/system metrics
    - For better flink job monitoring and facilitate customized solutions
like auto-scaling.
- Executor Interface / Client API [15]
    - Allow Flink downstream projects to easier and better monitor and
control flink jobs.
- Interactive Programming [16]
    - Allow users to cache the intermediate results in Table API for later
usage to avoid redundant computation when a Flink application contains
multiple jobs.
- Python User Defined Function [17]
    - Support native user-defined functions in Flink Python, including
UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
- Spillable heap backend [18]
    - A new state backend supporting automatic data spill and load when
memory exhausted/regained.
- RocksDB backend memory control [19]
    - Prevent excessive memory usage from RocksDB, especially in container
environment.
- Unaligned checkpoints [20]
    - Resolve the checkpoint timeout issue under backpressure.
- Separate framework and user class loader in per-job mode
- Active Kubernetes Integration [21]
    - Allow ResourceManager talking to Kubernetes to launch new pods
similar to Flink's Yarn/Mesos integration
- ML pipeline/library
    - Aims at delivering several core algorithms, including Logistic
Regression, Native Bayes, Random Forest, KMeans, etc.
- Add vertex subtask log url on WebUI [22]


** Suggested release timeline

Based on our usual time-based release schedule [23], and considering that
several events, such as Flink Forward Europe and Asia, are overlapping with
the current release cycle, we should aim at releasing 1.10 around the
beginning of January 2020. To give the community enough testing time, I
propose the feature freeze to be at the end of November. We should announce
an
exact date later in the release cycle.

Lastly, I would like to use the opportunity to propose Yu Li and myself as
release managers for the upcoming release.

What do you think?


Best,
Gary

[1]
https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
[3] https://issues.apache.org/jira/browse/FLINK-10725
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
[5]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
[6]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
[7]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
[8]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
[9] https://issues.apache.org/jira/browse/FLINK-11491
[10]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
[11]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
[12]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
[13] https://issues.apache.org/jira/browse/FLINK-10429
[14]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[15]
https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
[16]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
[17]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
[18]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
[19] https://issues.apache.org/jira/browse/FLINK-7289
[20]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
[21]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
[22] https://issues.apache.org/jira/browse/FLINK-13894
[23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases

[DISCUSS] Features for Apache Flink 1.10

Reply via email to