Re: [DISCUSS] Features for Apache Flink 1.10

Jark Wu Fri, 06 Sep 2019 20:30:48 -0700

Thanks Gary for kicking off the discussion for 1.10 release.

+1 for Gary and Yu as release managers. Thank you for you effort.


Best,
Jark


> 在 2019年9月7日，00:52，zhijiang <wangzhijiang...@aliyun.com.INVALID> 写道：
> 
> Hi Gary,
> 
> Thanks for kicking off the features for next release 1.10.  I am very 
> supportive of you and Yu Li to be the relaese managers.
> 
> Just mention another two improvements which want to be covered in FLINK-1.10 
> and I already confirmed with Piotr to reach an agreement before.
> 
> 1. Data serialize and copy only once for broadcast partition [1]: It would 
> improve the throughput performance greatly in broadcast mode and was actually 
> proposed in Flink-1.8. Most of works already done before and only left the 
> last critical jira/PR. It will not take much efforts to make it ready.
> 
> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It could 
> avoid memory copy from netty stack to flink managed network buffer. The 
> obvious benefit is decreasing the direct memory overhead greatly in 
> large-scale jobs. I also heard of some user cases encounter direct OOM caused 
> by netty memory overhead. Actually this improvment was proposed by nico in 
> FLINK-1.7 and always no time to focus then. Yun Gao already submitted a PR 
> half an year ago but have not been reviewed yet. I could help review the 
> deign and PR codes to make it ready. 
> 
> And you could make these two items as lowest priority if possible.
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10745
> [2] https://issues.apache.org/jira/browse/FLINK-10742
> 
> Best,
> Zhijiang
> ------------------------------------------------------------------
> From:Gary Yao <g...@apache.org>
> Send Time:2019年9月6日(星期五) 17:06
> To:dev <dev@flink.apache.org>
> Cc:carp84 <car...@gmail.com>
> Subject:[DISCUSS] Features for Apache Flink 1.10
> 
> Hi community,
> 
> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
> start kicking off the discussion about what we want to achieve for the 1.10
> release.
> 
> Based on discussions with various people as well as observations from
> mailing
> list threads, Yu Li and I have compiled a list of features that we deem
> important to be included in the next release. Note that the features
> presented
> here are not meant to be exhaustive. As always, I am sure that there will be
> other contributions that will make it into the next release. This email
> thread
> is merely to kick off a discussion, and to give users and contributors an
> understanding where the focus of the next release lies. If there is anything
> we have missed that somebody is working on, please reply to this thread.
> 
> 
> ** Proposed features and focus
> 
> Following the contribution of Blink to Apache Flink, the community released
> a
> preview of the Blink SQL Query Processor, which offers better SQL coverage
> and
> improved performance for batch queries, in Flink 1.9.0. However, the
> integration of the Blink query processor is not fully completed yet as there
> are still pending tasks, such as implementing full TPC-DS support. With the
> next Flink release, we aim at finishing the Blink integration.
> 
> Furthermore, there are several ongoing work threads addressing long-standing
> issues reported by users, such as improving checkpointing under
> backpressure,
> and limiting RocksDBs native memory usage, which can be especially
> problematic
> in containerized Flink deployments.
> 
> Notable features surrounding Flink’s ecosystem that are planned for the next
> release include active Kubernetes support (i.e., enabling Flink’s
> ResourceManager to launch new pods), improved Hive integration, Java 11
> support, and new algorithms for the Flink ML library.
> 
> Below I have included the list of features that we compiled ordered by
> priority – some of which already have ongoing mailing list threads, JIRAs,
> or
> FLIPs.
> 
> - Improving Flink’s build system & CI [1] [2]
> - Support Java 11 [3]
> - Table API improvements
>    - Configuration Evolution [4] [5]
>    - Finish type system: Expression Re-design [6] and UDF refactor
>    - Streaming DDL: Time attribute (watermark) and Changelog support
>    - Full SQL partition support for both batch & streaming [7]
>    - New Java Expression DSL [8]
>    - SQL CLI with DDL and DML support
> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>    - Partition/Function/View support
> - Remaining Blink planner/runtime merge
>    - Support all TPC-DS queries [9]
> - Finer grained resource management
>    - Unified TaskExecutor Memory Configuration [10]
>    - Fine Grained Operator Resource Management [11]
>    - Dynamic Slots Allocation [12]
> - Finish scheduler re-architecture [13]
>    - Allows implementing more sophisticated scheduling strategies such as
> better batch scheduler or speculative execution.
> - New DataStream Source Interface [14]
>    - A new source connector architecture to unify the implementation of
> source connectors and make it simpler to implement custom source connectors.
> - Add more source/system metrics
>    - For better flink job monitoring and facilitate customized solutions
> like auto-scaling.
> - Executor Interface / Client API [15]
>    - Allow Flink downstream projects to easier and better monitor and
> control flink jobs.
> - Interactive Programming [16]
>    - Allow users to cache the intermediate results in Table API for later
> usage to avoid redundant computation when a Flink application contains
> multiple jobs.
> - Python User Defined Function [17]
>    - Support native user-defined functions in Flink Python, including
> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
> - Spillable heap backend [18]
>    - A new state backend supporting automatic data spill and load when
> memory exhausted/regained.
> - RocksDB backend memory control [19]
>    - Prevent excessive memory usage from RocksDB, especially in container
> environment.
> - Unaligned checkpoints [20]
>    - Resolve the checkpoint timeout issue under backpressure.
> - Separate framework and user class loader in per-job mode
> - Active Kubernetes Integration [21]
>    - Allow ResourceManager talking to Kubernetes to launch new pods
> similar to Flink's Yarn/Mesos integration
> - ML pipeline/library
>    - Aims at delivering several core algorithms, including Logistic
> Regression, Native Bayes, Random Forest, KMeans, etc.
> - Add vertex subtask log url on WebUI [22]
> 
> 
> ** Suggested release timeline
> 
> Based on our usual time-based release schedule [23], and considering that
> several events, such as Flink Forward Europe and Asia, are overlapping with
> the current release cycle, we should aim at releasing 1.10 around the
> beginning of January 2020. To give the community enough testing time, I
> propose the feature freeze to be at the end of November. We should announce
> an
> exact date later in the release cycle.
> 
> Lastly, I would like to use the opportunity to propose Yu Li and myself as
> release managers for the upcoming release.
> 
> What do you think?
> 
> 
> Best,
> Gary
> 
> [1]
> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
> [2]
> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
> [3] https://issues.apache.org/jira/browse/FLINK-10725
> [4]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> [5]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
> [6]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> [7]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> [8]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
> [9] https://issues.apache.org/jira/browse/FLINK-11491
> [10]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> [11]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> [12]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> [13] https://issues.apache.org/jira/browse/FLINK-10429
> [14]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> [15]
> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
> [16]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> [17]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> [18]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
> [19] https://issues.apache.org/jira/browse/FLINK-7289
> [20]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
> [21]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
> [22] https://issues.apache.org/jira/browse/FLINK-13894
> [23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
>

Re: [DISCUSS] Features for Apache Flink 1.10

Reply via email to