Re: [PR] Add Flink 2.0.0 release [flink-web]

via GitHub Fri, 07 Mar 2025 01:28:42 -0800


xintongsong commented on code in PR #777:
URL: https://github.com/apache/flink-web/pull/777#discussion_r1984490539



##########
docs/content/posts/2025-03-01-release-2.0.0.md:
##########
@@ -0,0 +1,1639 @@
+---
+authors:
+- xtsong:
+  name: "Xintong Song"
+date: "2025-03-01T08:00:00Z"
+title: "Apache Flink 2.0.0: A new Era of Real-Time Data Processing"
+aliases:
+- /news/2025/03/01/release-2.0.html
+---
+
+Today, the Flink PMC is proud to announce the official release of Apache Flink 
2.0.0! This marks the first release in the Flink 2.x series and is the first 
major release since Flink 1.0 launched nine years ago. This version is the 
culmination of two years of meticulous preparation and collaboration, 
signifying a new chapter in the evolution of Flink.
+
+In this release, 159 contributors have come together to complete 25 FLIPs 
(Flink Improvement Proposals) and 351 issues. We extend our heartfelt gratitude 
to all contributors for their invaluable contributions to this milestone 
release!
+
+Over the past decade, Apache Flink has undergone transformative evolution. In 
the 1.0 era, Flink pioneered Stateful Computations over Data Streams, making 
end-to-end exactly-once stateful stream processing a reality. Today, real-time 
processing with sub-second latency has become a standard expectation. However, 
users of real-time computing now face new challenges that hinder broader 
adoption. The costs of real-time computing have remained prohibitively high, 
both in terms of expensive resource consumption and the steep learning curve 
required to master complex distributed stream processing concepts. These 
barriers limit the application of real-time computing across more diverse use 
cases. Meanwhile, the rapid emergence of modern trends such as cloud-native 
architectures, data lakes, and AI LLMs has introduced new requirements for 
real-time systems. In the 2.0 era, Flink is tackling these challenges head-on. 
By addressing these pain points, Flink aims to deliver more accessible and s
 calable real-time computing solutions, empowering organizations to fully 
embrace real-time capabilities across the entire spectrum of big data and AI 
applications. This new chapter represents Flink's commitment to making 
real-time computing more practical, efficient, and widely applicable than ever 
before.
+
+In the 2.0 release, Flink introduces several innovative features that address 
key challenges in real-time data processing and align with the growing demands 
of modern applications, including AI-driven workflows.
+- The **Disaggregated State Management** architecture enables more efficient 
resource utilization in cloud-native environments, ensuring high-performance 
real-time processing while minimizing resource overhead.
+- The introduction and refinement of **Materialized Tables** empower users to 
focus on business logic without needing to understand the complexities of 
stream processing or the differences between stream and batch execution modes, 
simplifying development and enhances productivity for users across various 
domains. Optimizations in **Batch Execution** mode provide a cost-effective 
alternative for scenarios where near-real-time or non-real-time processing is 
sufficient, expanding Flink's versatility for diverse use cases.
+- Additionally, the deep integration with Apache Paimon strengthens the 
**Streaming Lakehouse** architecture, making Flink a leading solution for 
real-time data lake use cases.
+- As AI and LLMs continue to gain prominence, the demand for scalable, 
real-time data processing solutions grows. Flink 2.0's advancements in 
performance, resource efficiency, and ease of use position it as a strong 
foundation for **AI workflows**, ensuring that Flink remains at the forefront 
of real-time data processing innovations.
+
+These enhancements collectively demonstrate Flink's commitment to addressing 
the evolving needs of modern data applications, including the integration of 
real-time processing capabilities with AI-driven systems.
+
+In addition to the new features introduced in Flink 2.0, the release also 
includes a comprehensive cleanup of deprecated APIs and configurations, which 
may result in backward-incompatible changes in certain interfaces and 
behaviors. Users upgrading to this version should pay special attention to 
these changes to ensure a smooth transition.
+
+# Highlights of New Features
+
+## Disaggregated State Management
+
+The past decade has seen a transformative evolution in Flink's deployment 
paradigms, workload patterns, and hardware advancements. From the tightly 
coupled compute-storage nodes of the map-reduce era, we have transitioned to a 
cloud-native world where containerized deployments on Kubernetes are now the 
norm. To fully embrace this shift, Flink 2.0 introduces Disaggregated State 
Storage and Management, leveraging Distributed File Systems (DFS) as the 
primary storage medium. This architectural innovation addresses critical 
challenges posed by the cloud-native environment while enabling new levels of 
scalability, performance, and flexibility.
+
+This new architecture solves the following challenges brought in the 
cloud-native era for Flink. 
+1. Local Disk Constraints in containerization 
+2. Spiky Resource Usage caused by compaction in the current state model 
+3. Fast Rescaling for jobs with large states (hundreds of Terabytes) 
+4. Light and Fast Checkpoint in a native way 
+
+While extending the state store to interact with remote DFS seems like a 
straightforward solution, it is insufficient due to Flink's existing blocking 
execution model. To overcome this limitation, Flink 2.0 introduces an 
asynchronous execution model alongside a disaggregated state backend, as well 
as newly designed SQL operators performing asynchronous state access in 
parallel.
+
+Flink 2.0 delivers a comprehensive end-to-end experience for disaggregated 
state management, encompassing both the runtime and SQL operator layers:
+
+### Asynchronous Execution Model
+
+- Out-of-Order Record Processing: Decouples state access from computation to 
enable parallel execution.
+- Asynchronous State APIs: Full support for non-blocking state operations 
during checkpointing, reducing latency and improving resource utilization.
+- Semantic Preservation: Maintains core Flink guarantees (e.g., watermark 
propagation, timer handling, and key ordering) to ensure that users can adopt 
the new architecture without worrying about behavioral changes in their 
applications.
+
+### Enhanced SQL Operators
+
+- Leveraging the new asynchronous state APIs, Flink 2.0 re-implements seven 
critical SQL operators, including stateful operations like Joins and Aggregates 
(e.g., Window Aggregation, Group Aggregation). These optimizations target 
high-latency state access scenarios, enabling non-blocking execution to 
maximize throughput.
+- Users can enable this feature by setting the configuration parameter 
`table.exec.async-state.enabled`. Once activated, all supported SQL operators 
within a job automatically switch to asynchronous state access mode without 
requiring code changes.
+- In the Nexmark benchmark, 11 out of 14 stateful queries are now fully 
compatible with the asynchronous execution model, demonstrating significant 
performance improvements. Efforts are underway to extend support to the 
remaining stateful operators.
+
+### ForStDB - A Disaggregated State Backend
+
+- ForStDB, which stands for For Streaming DB, is a purpose-built, 
disaggregated state backend designed to meet the unique demands of cloud-native 
deployments. By decoupling state storage from compute resources, ForStDB 
removes the limitations of local disk usage and supports parallel multi-I/O 
operations, effectively mitigating the impact of increased latency.
+- ForstDB's integration with DFS ensures durability and fault tolerance while 
maintaining high performance through optimized read/write operations. It could 
perform the checkpoint and recovery very light and fast. 
+
+### Performance Evaluation (on [Nexmark](https://github.com/nexmark/nexmark))
+
+<div style="text-align: center;">
+<img src="/img/blog/2025-03-01-release-2.0.0/nexmark.png" 
style="width:70%;margin:15px">
+</div>
+
+- For stateful queries with heavy I/O (q5,q7,q18,q19,q20), Flink 2.0 with 
disaggregated state and 1GB cache could achieve 75% ~ 120% in throughput 
comparing to the traditional local state store solution. Notably, the state 
sizes for these queries range from 1.2GB to 4.8GB, and even under constrained 
caching conditions, the disaggregated state architecture with limited local 
cache demonstrates competitive performance against the fully local state setup. 
Remarkably, even without any caching, the asynchronous model ensures 
approximately 50% of the throughput achieved by the local state store.
+- For stateful queries with small state ranging in 10MB to 400MB 
(q3,q4,q5,q8,q12,q17), states fully reside in the memory block cache, rendering 
disk I/O negligible. The disaggregated state store’s performance trails the 
local state store by an average of no more than 10% in these cases.
+- Benchmark results confirm the disaggregated state architecture’s capability 
to efficiently handle large-scale stateful workloads. It emerges as a seamless, 
high-performance alternative to traditional aggregated state storage, without 
significant performance trade-offs.
+
+Flink 2.0's Disaggregated State Management represents a pivotal step toward a 
truly cloud-native future. By addressing key challenges such as local disk 
constraints, spiky resource usage, and the need for fast rescaling, this 
architecture empowers users to build scalable, high-performance streaming 
applications. With the introduction of the asynchronous execution model and 
ForStDB, along with enhanced SQL operator capabilities, we expect Flink 2.0 to 
be a new standard for stateful stream processing in the cloud-native era.
+
+## Stream-Batch Unification
+
+### Materialized Table
+
+Materialized Tables represent a cornerstone of our vision to unify stream and 
batch processing paradigms. These tables enable users to declaratively manage 
both real-time and historical data through a single pipeline, eliminating the 
need for separate codebases or workflows.
+
+In this release, with a focus on production-grade operability, we have done 
critical enhancements to simplify lifecycle management and execution in 
real-world environments:
+
+**Query Modifications** - Materialized Tables now support schema and query 
updates, enabling seamless iteration of business logic without reprocessing 
historical data. This is vital for production scenarios requiring rapid schema 
evolution and computational adjustments.
+
+**Kubernetes/Yarn Submission** - Beyond standalone clusters, Flink 2.0 extends 
native support for submitting Materialized Table refresh jobs to YARN and 
Kubernetes clusters. This allows users to seamlessly integrate refresh 
workflows into their production-grade infrastructure, leveraging standardized 
resource management, fault tolerance, and scalability. 
+
+**Ecosystem Integration** - Collaborating with the Apache Paimon community, 
Materialized Tables now integrate natively with Paimon’s lake storage format, 
combining Flink’s stream-batch compute with Paimon’s high-performance ACID 
transactions for unified data serving.
+
+By streamlining modifications and execution on production infrastructure, 
Materialized Tables empower teams to unify streaming and batch pipelines with 
higher reliability. Future iterations will deepen production support, including 
integration with a production-ready schedulers to enable policy-driven refresh 
automation.
+
+### Adaptive Batch Execution
+
+Flink possesses adaptive batch execution capabilities that optimize execution 
plans based on runtime information to enhance performance. Key features include 
dynamic partition pruning, runtime filter, and automatic parallelism adjustment 
based on data volume. In Flink 2.0, we have further strengthened these 
capabilities with two new optimizations:
+
+**Adaptive Broadcast Join** - Compared to Shuffled Hash Join and Sort Merge 
Join, Broadcast Join eliminates the need for large-scale data shuffling and 
sorting, delivering superior execution efficiency. However, its applicability 
depends on one side of the input being sufficiently small; otherwise, 
performance or stability issues may arise. During the static SQL optimization 
phase, accurately estimating the input data volume of a Join operator is 
challenging, making it difficult to determine whether Broadcast Join is 
suitable. By enabling adaptive execution optimization, Flink dynamically 
captures the actual input conditions of Join operators at runtime and 
automatically switches to Broadcast Join when criteria are met, significantly 
improving execution efficiency.
+
+**Automatic Join Skew Optimization** - In Join operations, frequent 
occurrences of specific keys may lead to significant disparities in data 
volumes processed by downstream Join tasks. Tasks handling larger data volumes 
can become long-tail bottlenecks, severely delaying overall job execution. 
Through the Adaptive Skewed Join optimization, Flink leverages runtime 
statistical information from Join operator inputs to dynamically split skewed 
data partitions while ensuring the integrity of Join results. This effectively 
mitigates long-tail latency caused by data skew.
+
+See more details about the capabilities and usages of Flink's [Adaptive Batch 
Execution](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/adaptive_batch/).

Review Comment:
   I'll update it for all doc urls before merging.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add Flink 2.0.0 release [flink-web]

Reply via email to