Thanks Gary for kicking off the discussion for 1.10 release. +1 for Gary and Yu as release managers. Thank you for you effort.
Best, Jark > 在 2019年9月7日,00:52,zhijiang <wangzhijiang...@aliyun.com.INVALID> 写道: > > Hi Gary, > > Thanks for kicking off the features for next release 1.10. I am very > supportive of you and Yu Li to be the relaese managers. > > Just mention another two improvements which want to be covered in FLINK-1.10 > and I already confirmed with Piotr to reach an agreement before. > > 1. Data serialize and copy only once for broadcast partition [1]: It would > improve the throughput performance greatly in broadcast mode and was actually > proposed in Flink-1.8. Most of works already done before and only left the > last critical jira/PR. It will not take much efforts to make it ready. > > 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It could > avoid memory copy from netty stack to flink managed network buffer. The > obvious benefit is decreasing the direct memory overhead greatly in > large-scale jobs. I also heard of some user cases encounter direct OOM caused > by netty memory overhead. Actually this improvment was proposed by nico in > FLINK-1.7 and always no time to focus then. Yun Gao already submitted a PR > half an year ago but have not been reviewed yet. I could help review the > deign and PR codes to make it ready. > > And you could make these two items as lowest priority if possible. > > [1] https://issues.apache.org/jira/browse/FLINK-10745 > [2] https://issues.apache.org/jira/browse/FLINK-10742 > > Best, > Zhijiang > ------------------------------------------------------------------ > From:Gary Yao <g...@apache.org> > Send Time:2019年9月6日(星期五) 17:06 > To:dev <dev@flink.apache.org> > Cc:carp84 <car...@gmail.com> > Subject:[DISCUSS] Features for Apache Flink 1.10 > > Hi community, > > Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to > start kicking off the discussion about what we want to achieve for the 1.10 > release. > > Based on discussions with various people as well as observations from > mailing > list threads, Yu Li and I have compiled a list of features that we deem > important to be included in the next release. Note that the features > presented > here are not meant to be exhaustive. As always, I am sure that there will be > other contributions that will make it into the next release. This email > thread > is merely to kick off a discussion, and to give users and contributors an > understanding where the focus of the next release lies. If there is anything > we have missed that somebody is working on, please reply to this thread. > > > ** Proposed features and focus > > Following the contribution of Blink to Apache Flink, the community released > a > preview of the Blink SQL Query Processor, which offers better SQL coverage > and > improved performance for batch queries, in Flink 1.9.0. However, the > integration of the Blink query processor is not fully completed yet as there > are still pending tasks, such as implementing full TPC-DS support. With the > next Flink release, we aim at finishing the Blink integration. > > Furthermore, there are several ongoing work threads addressing long-standing > issues reported by users, such as improving checkpointing under > backpressure, > and limiting RocksDBs native memory usage, which can be especially > problematic > in containerized Flink deployments. > > Notable features surrounding Flink’s ecosystem that are planned for the next > release include active Kubernetes support (i.e., enabling Flink’s > ResourceManager to launch new pods), improved Hive integration, Java 11 > support, and new algorithms for the Flink ML library. > > Below I have included the list of features that we compiled ordered by > priority – some of which already have ongoing mailing list threads, JIRAs, > or > FLIPs. > > - Improving Flink’s build system & CI [1] [2] > - Support Java 11 [3] > - Table API improvements > - Configuration Evolution [4] [5] > - Finish type system: Expression Re-design [6] and UDF refactor > - Streaming DDL: Time attribute (watermark) and Changelog support > - Full SQL partition support for both batch & streaming [7] > - New Java Expression DSL [8] > - SQL CLI with DDL and DML support > - Hive compatibility completion (DDL/UDF) to support full Hive integration > - Partition/Function/View support > - Remaining Blink planner/runtime merge > - Support all TPC-DS queries [9] > - Finer grained resource management > - Unified TaskExecutor Memory Configuration [10] > - Fine Grained Operator Resource Management [11] > - Dynamic Slots Allocation [12] > - Finish scheduler re-architecture [13] > - Allows implementing more sophisticated scheduling strategies such as > better batch scheduler or speculative execution. > - New DataStream Source Interface [14] > - A new source connector architecture to unify the implementation of > source connectors and make it simpler to implement custom source connectors. > - Add more source/system metrics > - For better flink job monitoring and facilitate customized solutions > like auto-scaling. > - Executor Interface / Client API [15] > - Allow Flink downstream projects to easier and better monitor and > control flink jobs. > - Interactive Programming [16] > - Allow users to cache the intermediate results in Table API for later > usage to avoid redundant computation when a Flink application contains > multiple jobs. > - Python User Defined Function [17] > - Support native user-defined functions in Flink Python, including > UDF/UDAF/UDTF in Table API and Python-Java mixed UDF. > - Spillable heap backend [18] > - A new state backend supporting automatic data spill and load when > memory exhausted/regained. > - RocksDB backend memory control [19] > - Prevent excessive memory usage from RocksDB, especially in container > environment. > - Unaligned checkpoints [20] > - Resolve the checkpoint timeout issue under backpressure. > - Separate framework and user class loader in per-job mode > - Active Kubernetes Integration [21] > - Allow ResourceManager talking to Kubernetes to launch new pods > similar to Flink's Yarn/Mesos integration > - ML pipeline/library > - Aims at delivering several core algorithms, including Logistic > Regression, Native Bayes, Random Forest, KMeans, etc. > - Add vertex subtask log url on WebUI [22] > > > ** Suggested release timeline > > Based on our usual time-based release schedule [23], and considering that > several events, such as Flink Forward Europe and Asia, are overlapping with > the current release cycle, we should aim at releasing 1.10 around the > beginning of January 2020. To give the community enough testing time, I > propose the feature freeze to be at the end of November. We should announce > an > exact date later in the release cycle. > > Lastly, I would like to use the opportunity to propose Yu Li and myself as > release managers for the upcoming release. > > What do you think? > > > Best, > Gary > > [1] > https://lists.apache.org/thread.html/775447a187410727f5ba6f9cefd6406c58ca5cc5c580aecf30cf213e@%3Cdev.flink.apache.org%3E > [2] > https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E > [3] https://issues.apache.org/jira/browse/FLINK-10725 > [4] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration > [5] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object > [6] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design > [7] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > [8] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL > [9] https://issues.apache.org/jira/browse/FLINK-11491 > [10] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > [11] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management > [12] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > [13] https://issues.apache.org/jira/browse/FLINK-10429 > [14] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface > [15] > https://lists.apache.org/thread.html/498dd3e0277681cda356029582c1490299ae01df912e15942e11ae8e@%3Cdev.flink.apache.org%3E > [16] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink > [17] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table > [18] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-50%3A+Spill-able+Heap+Keyed+State+Backend > [19] https://issues.apache.org/jira/browse/FLINK-7289 > [20] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html > [21] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Best-practice-to-run-flink-on-kubernetes-td31532.html > [22] https://issues.apache.org/jira/browse/FLINK-13894 > [23] https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases >