Re: [DISCUSS] Features for Apache Flink 1.10

Dian Fu Fri, 06 Sep 2019 21:27:13 -0700

Hi Gary,

Thanks for kicking off the release schedule of 1.10. +1 for you and Yu Li as 
the release manager.


The feature freeze/release time sounds reasonable.

Thanks,
Dian

> 在 2019年9月7日，上午11:30，Jark Wu <imj...@gmail.com> 写道：
> 
> Thanks Gary for kicking off the discussion for 1.10 release.
> 
> +1 for Gary and Yu as release managers. Thank you for you effort. 
> 
> Best,
> Jark
> 
> 
>> 在 2019年9月7日，00:52，zhijiang <wangzhijiang...@aliyun.com.INVALID> 写道：
>> 
>> Hi Gary,
>> 
>> Thanks for kicking off the features for next release 1.10.  I am very 
>> supportive of you and Yu Li to be the relaese managers.
>> 
>> Just mention another two improvements which want to be covered in FLINK-1.10 
>> and I already confirmed with Piotr to reach an agreement before.
>> 
>> 1. Data serialize and copy only once for broadcast partition [1]: It would 
>> improve the throughput performance greatly in broadcast mode and was 
>> actually proposed in Flink-1.8. Most of works already done before and only 
>> left the last critical jira/PR. It will not take much efforts to make it 
>> ready.
>> 
>> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It 
>> could avoid memory copy from netty stack to flink managed network buffer. 
>> The obvious benefit is decreasing the direct memory overhead greatly in 
>> large-scale jobs. I also heard of some user cases encounter direct OOM 
>> caused by netty memory overhead. Actually this improvment was proposed by 
>> nico in FLINK-1.7 and always no time to focus then. Yun Gao already 
>> submitted a PR half an year ago but have not been reviewed yet. I could help 
>> review the deign and PR codes to make it ready. 
>> 
>> And you could make these two items as lowest priority if possible.
>> 
>> [1] https://issues.apache.org/jira/browse/FLINK-10745
>> [2] https://issues.apache.org/jira/browse/FLINK-10742
>> 
>> Best,
>> Zhijiang
>> ------------------------------------------------------------------
>> From:Gary Yao <g...@apache.org>
>> Send Time:2019年9月6日(星期五) 17:06
>> To:dev <dev@flink.apache.org>
>> Cc:carp84 <car...@gmail.com>
>> Subject:[DISCUSS] Features for Apache Flink 1.10
>> 
>> Hi community,
>> 
>> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
>> start kicking off the discussion about what we want to achieve for the 1.10
>> release.
>> 
>> Based on discussions with various people as well as observations from
>> mailing
>> list threads, Yu Li and I have compiled a list of features that we deem
>> important to be included in the next release. Note that the features
>> presented
>> here are not meant to be exhaustive. As always, I am sure that there will be
>> other contributions that will make it into the next release. This email
>> thread
>> is merely to kick off a discussion, and to give users and contributors an
>> understanding where the focus of the next release lies. If there is anything
>> we have missed that somebody is working on, please reply to this thread.
>> 
>> 
>> ** Proposed features and focus
>> 
>> Following the contribution of Blink to Apache Flink, the community released
>> a
>> preview of the Blink SQL Query Processor, which offers better SQL coverage
>> and
>> improved performance for batch queries, in Flink 1.9.0. However, the
>> integration of the Blink query processor is not fully completed yet as there
>> are still pending tasks, such as implementing full TPC-DS support. With the
>> next Flink release, we aim at finishing the Blink integration.
>> 
>> Furthermore, there are several ongoing work threads addressing long-standing
>> issues reported by users, such as improving checkpointing under
>> backpressure,
>> and limiting RocksDBs native memory usage, which can be especially
>> problematic
>> in containerized Flink deployments.
>> 
>> Notable features surrounding Flink’s ecosystem that are planned for the next
>> release include active Kubernetes support (i.e., enabling Flink’s
>> ResourceManager to launch new pods), improved Hive integration, Java 11
>> support, and new algorithms for the Flink ML library.
>> 
>> Below I have included the list of features that we compiled ordered by
>> priority – some of which already have ongoing mailing list threads, JIRAs,
>> or
>> FLIPs.
>> 
>> - Improving Flink’s build system & CI [1] [2]
>> - Support Java 11 [3]
>> - Table API improvements
>>   - Configuration Evolution [4] [5]
>>   - Finish type system: Expression Re-design [6] and UDF refactor
>>   - Streaming DDL: Time attribute (watermark) and Changelog support
>>   - Full SQL partition support for both batch & streaming [7]
>>   - New Java Expression DSL [8]
>>   - SQL CLI with DDL and DML support
>> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>>   - Partition/Function/View support
>> - Remaining Blink planner/runtime merge
>>   - Support all TPC-DS queries [9]
>> - Finer grained resource management
>>   - Unified TaskExecutor Memory Configuration [10]
>>   - Fine Grained Operator Resource Management [11]
>>   - Dynamic Slots Allocation [12]
>> - Finish scheduler re-architecture [13]
>>   - Allows implementing more sophisticated scheduling strategies such as
>> better batch scheduler or speculative execution.
>> - New DataStream Source Interface [14]
>>   - A new source connector architecture to unify the implementation of
>> source connectors and make it simpler to implement custom source connectors.
>> - Add more source/system metrics
>>   - For better flink job monitoring and facilitate customized solutions
>> like auto-scaling.
>> - Executor Interface / Client API [15]
>>   - Allow Flink downstream projects to easier and better monitor and
>> control flink jobs.
>> - Interactive Programming [16]
>>   - Allow users to cache the intermediate results in Table API for later
>> usage to avoid redundant computation when a Flink application contains
>> multiple jobs.
>> - Python User Defined Function [17]
>>   - Support native user-defined functions in Flink Python, including
>> UDF/UDAF/UDTF in Table API and Python-Java mixed UDF.
>> - Spillable heap backend [18]
>>   - A new state backend supporting automatic data spill and load when
>> memory exhausted/regained.
>> - RocksDB backend memory control [19]
>>   - Prevent excessive memory usage from RocksDB, especially in container
>> environment.
>> - Unaligned checkpoints [20]
>>   - Resolve the checkpoint timeout issue under backpressure.
>> - Separate framework and user class loader in per-job mode
>> - Active Kubernetes Integration [21]
>>   - Allow ResourceManager talking to Kubernetes to launch new pods
>> similar to Flink's Yarn/Mesos integration
>> - ML pipeline/library
>>   - Aims at delivering several core algorithms, including Logistic
>> Regression, Native Bayes, Random Forest, KMeans, etc.
>> - Add vertex subtask log url on WebUI [22]
>> 
>> 
>> ** Suggested release timeline
>> 
>> Based on our usual time-based release schedule [23], and considering that
>> several events, such as Flink Forward Europe and Asia, are overlapping with
>> the current release cycle, we should aim at releasing 1.10 around the
>> beginning of January 2020. To give the community enough testing time, I
>> propose the feature freeze to be at the end of November. We should announce
>> an
>> exact date later in the release cycle.
>> 
>> Lastly, I would like to use the opportunity to propose Yu Li and myself as
>> release managers for the upcoming release.
>> 
>> What do you think?
>> 
>> 
>> Best,
>> Gary
>> 
>> [1]
>> https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E
>> [2]
>> https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E
>> [3] https://issues.apache.org/jira/browse/FLINK-10725
>> [4]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>> [5]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
>> [6]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
>> [7]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
>> [8]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
>> [9] https://issues.apache.org/jira/browse/FLINK-11491
>> [10]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>> [11]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>> [12]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
>> [13] https://issues.apache.org/jira/browse/FLINK-10429
>> [14]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>> [15]
>> https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E
>> [16]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>> [17]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>> [18]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend
>> [19] https://issues.apache.org/jira/browse/FLINK-7289
>> [20]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html
>> [21]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html
>> [22] https://issues.apache.org/jira/browse/FLINK-13894
>> [23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
>> 
>

Re: [DISCUSS] Features for Apache Flink 1.10

Reply via email to