Hi Stephan, Thanks for summarizing the work&discussions into a roadmap. It really helps users to understand where Flink will forward to. The entire outline looks good to me. If appropriate, I would recommend to add another two attracting categories in the roadmap.
*Flink ML Enhancement* - Refactor ML pipeline on TableAPI - Python support for TableAPI - Support streaming training & inference. - Seamless integration of DL engines (Tensorflow, PyTorch etc) - ML platform with a group of AI tooling Some of these work have already been discussed in the dev mail list. Related JIRA (FLINK-11095) and discussion: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html ; http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html *Flink-Runtime-Web Improvement* - Much of this comes via Blink - Refactor the entire module to use latest Angular (7.x) - Add resource information at three levels including Cluster, TaskManager and Job - Add operator level topology and and data flow tracing - Add new metrics to track the back pressure, filter and data skew - Add log association to Job, Vertex and SubTasks Related JIRA (FLINK-10705) and discussion: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html What do you think? Regards, Shaoxuan On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote: > Hi all! > > Recently several contributors, committers, and users asked about making it > more visible in which way the project is currently going. > > Users and developers can track the direction by following the discussion > threads and JIRA, but due to the mass of discussions and open issues, it is > very hard to get a good overall picture. > Especially for new users and contributors, is is very hard to get a quick > overview of the project direction. > > To fix this, I suggest to add a brief roadmap summary to the homepage. It > is a bit of a commitment to keep that roadmap up to date, but I think the > benefit for users justifies that. > The Apache Beam project has added such a roadmap [1] > <https://beam.apache.org/roadmap/>, which was received very well by the > community, I would suggest to follow a similar structure here. > > If the community is in favor of this, I would volunteer to write a first > version of such a roadmap. The points I would include are below. > > Best, > Stephan > > [1] https://beam.apache.org/roadmap/ > > ======================================================== > > Disclaimer: Apache Flink is not governed or steered by any one single > entity, but by its community and Project Management Committee (PMC). This > is not a authoritative roadmap in the sense of a plan with a specific > timeline. Instead, we share our vision for the future and major initiatives > that are receiving attention and give users and contributors an > understanding what they can look forward to. > > *Future Role of Table API and DataStream API* > - Table API becomes first class citizen > - Table API becomes primary API for analytics use cases > * Declarative, automatic optimizations > * No manual control over state and timers > - DataStream API becomes primary API for applications and data pipeline > use cases > * Physical, user controls data types, no magic or optimizer > * Explicit control over state and time > > *Batch Streaming Unification* > - Table API unification (environments) (FLIP-32) > - New unified source interface (FLIP-27) > - Runtime operator unification & code reuse between DataStream / Table > - Extending Table API to make it convenient API for all analytical use > cases (easier mix in of UDFs) > - Same join operators on bounded/unbounded Table API and DataStream API > > *Faster Batch (Bounded Streams)* > - Much of this comes via Blink contribution/merging > - Fine-grained Fault Tolerance on bounded data (Table API) > - Batch Scheduling on bounded data (Table API) > - External Shuffle Services Support on bounded streams > - Caching of intermediate results on bounded data (Table API) > - Extending DataStream API to explicitly model bounded streams (API > breaking) > - Add fine fault tolerance, scheduling, caching also to DataStream API > > *Streaming State Evolution* > - Let all built-in serializers support stable evolution > - First class support for other evolvable formats (Protobuf, Thrift) > - Savepoint input/output format to modify / adjust savepoints > > *Simpler Event Time Handling* > - Event Time Alignment in Sources > - Simpler out-of-the box support in sources > > *Checkpointing* > - Consistency of Side Effects: suspend / end with savepoint (FLIP-34) > - Failed checkpoints explicitly aborted on TaskManagers (not only on > coordinator) > > *Automatic scaling (adjusting parallelism)* > - Reactive scaling > - Active scaling policies > > *Kubernetes Integration* > - Active Kubernetes Integration (Flink actively manages containers) > > *SQL Ecosystem* > - Extended Metadata Stores / Catalog / Schema Registries support > - DDL support > - Integration with Hive Ecosystem > > *Simpler Handling of Dependencies* > - Scala in the APIs, but not in the core (hide in separate class loader) > - Hadoop-free by default > >