Re: [PR] [FLINK-37432][release] Add release note for release 2.0 [flink]

via GitHub Fri, 07 Mar 2025 09:23:32 -0800


davidradl commented on code in PR #26266:
URL: https://github.com/apache/flink/pull/26266#discussion_r1985429682



##########
docs/content/release-notes/flink-2.0.md:
##########
@@ -0,0 +1,1666 @@
+---
+title: "Release Notes - Flink 2.0"
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Release notes - Flink 2.0
+
+These release notes discuss important aspects, such as configuration, behavior 
or dependencies,
+that changed between Flink 1.20 and Flink 2.0. Please read these notes 
carefully if you are 
+planning to upgrade your Flink version to 2.0.
+
+## New Features & Behavior Changes
+
+### State & Checkpoints
+
+#### Disaggregated State Storage and Management
+
+##### [FLINK-32070](https://issues.apache.org/jira/browse/FLINK-32070)
+
+The past decade has witnessed a dramatic shift in Flink's deployment mode, 
workload patterns, and hardware improvements. We've moved from the map-reduce 
era where workers are computation-storage tightly coupled nodes to a 
cloud-native world where containerized deployments on Kubernetes become 
standard. To enable Flink's Cloud-Native future, we introduce Disaggregated 
State Storage and Management that uses remote storage as primary storage in 
Flink 2.0.
+
+This new architecture solves the following challenges brought in the 
cloud-native era for Flink.
+1. Local Disk Constraints in containerization
+2. Spiky Resource Usage caused by compaction in the current state model
+3. Fast Rescaling for jobs with large states (hundreds of Terabytes)
+4. Light and Fast Checkpoint in a native way
+
+While extending the state store to interact with remote DFS seems like a 
straightforward solution, it is insufficient due to Flink's existing blocking 
execution model. To overcome this limitation, Flink 2.0 introduces an 
asynchronous execution model alongside a disaggregated state backend, as well 
as newly designed SQL operators performing asynchronous state access in 
parallel.
+
+#### Native file copy support
+
+##### [FLINK-35886](https://issues.apache.org/jira/browse/FLINK-35886)
+
+Users can now configure Flink to use s5cmd to speed up downloading files from 
S3 during the recovery process, when using RocksDB, by a factor of 2.
+
+#### Synchronize rescaling with checkpoint creation to minimize reprocessing 
for the AdaptiveScheduler
+
+##### [FLINK-35549](https://issues.apache.org/jira/browse/FLINK-35549)
+
+This enables the user to synchronize checkpointing and rescaling in the 
AdaptiveScheduler. New configuration parameters were introduced for the maximum 
trigger delay and the number of acceptable failed checkpoints before triggering 
a rescale to make this behavior configurable. These parameters were updated in 
[FLINK-36015](https://issues.apache.org/jira/browse/FLINK-36015).
+
+### Runtime & Coordination
+
+#### Further Optimization of Adaptive Batch Execution
+
+##### [FLINK-36333](https://issues.apache.org/jira/browse/FLINK-36333), 
[FLINK-36159](https://issues.apache.org/jira/browse/FLINK-36159)
+
+Flink possesses adaptive batch execution capabilities that optimize execution 
plans based on runtime information to enhance performance. Key features include 
dynamic partition pruning, Runtime Filter, and automatic parallelism adjustment 
based on data volume. In Flink 2.0, we have further strengthened these 
capabilities with two new optimizations:
+
+*Adaptive Broadcast Join* - Compared to Shuffled Hash Join and Sort Merge 
Join, Broadcast Join eliminates the need for large-scale data shuffling and 
sorting, delivering superior execution efficiency. However, its applicability 
depends on one side of the input being sufficiently small; otherwise, 
performance or stability issues may arise. During the static SQL optimization 
phase, accurately estimating the input data volume of a Join operator is 
challenging, making it difficult to determine whether Broadcast Join is 
suitable. By enabling adaptive execution optimization, Flink dynamically 
captures the actual input conditions of Join operators at runtime and 
automatically switches to Broadcast Join when criteria are met, significantly 
improving execution efficiency.
+
+*Automatic Join Skew Optimization* - In Join operations, frequent occurrences 
of specific keys may lead to significant disparities in data volumes processed 
by downstream Join tasks. Tasks handling larger data volumes can become 
long-tail bottlenecks, severely delaying overall job execution. Through the 
Adaptive Skewed Join optimization, Flink leverages runtime statistical 
information from Join operator inputs to dynamically split skewed data 
partitions while ensuring the integrity of Join results. This effectively 
mitigates long-tail latency caused by data skew.
+
+See more details about the capabilities and usages of Flink's [Adaptive Batch 
Execution](https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/deployment/adaptive_batch/).
+
+#### Adaptive Scheduler respects `execution.state-recovery.from-local` flag now
+
+##### [FLINK-36201](https://issues.apache.org/jira/browse/FLINK-36201)
+
+AdaptiveScheduler now respects `execution.state-recovery.from-local` flag, 
which defaults to false. As a result you now need to opt-in to make local 
recovery work.
+
+#### Align the desired and sufficient resources definition in Executing and 
WaitForResources states
+
+##### [FLINK-36014](https://issues.apache.org/jira/browse/FLINK-36014)
+
+The new configuration 
`jobmanager.adaptive-scheduler.executing.resource-stabilization-timeout` for 
the AdaptiveScheduler was introduced. It defines a duration for which the 
JobManager delays the scaling operation after a resource change if only 
sufficient resources are available.
+
+The existing configuration 
`jobmanager.adaptive-scheduler.min-parallelism-increase` was deprecated and is 
not used by Flink anymore.
+
+#### Incorrect watermark idleness timeout accounting when subtask is 
backpressured/blocked
+
+##### [FLINK-35886](https://issues.apache.org/jira/browse/FLINK-35886)
+
+For detecting idleness, the way how idleness timeout is calculated has 
changed. Previously the time, when source or source's split has been 
backpressured or blocked due to watermark alignment, was accounted towards the 
idleness timeout. This could lead to a situation where sources or some splits 
were incorrectly switching to idle, while they were being unable to make any 
progress and had some more records to emit, which in turn could result in 
incorrectly calculated watermarks and erroneous late data. This has been fixed 
for 2.0.
+
+This change required some API changes, like introduction of 
`org.apache.flink.api.common.eventtime.WatermarkGeneratorSupplier.Context#getInputActivityClock`.
 However this shouldn't create compatibility problems for users upgrading from 
prior Flink versions.

Review Comment:
   best to list all the changes .
   nit: like introduction -> like the introduction
   nit: `However this shouldn't create compatibility problems for users 
upgrading from prior Flink versions.` Does this mean it might? If there are no 
compatibility issue that we should say "`However this will not create 
compatibility problems for users upgrading ...."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-37432][release] Add release note for release 2.0 [flink]

Reply via email to