superdiaodiao commented on code in PR #25091: URL: https://github.com/apache/flink/pull/25091#discussion_r1691112012
########## docs/content.zh/release-notes/flink-1.20.md: ########## @@ -0,0 +1,434 @@ +--- +title: "Release Notes - Flink 1.20" +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Release notes - Flink 1.20 + +These release notes discuss important aspects, such as configuration, behavior or dependencies, +that changed between Flink 1.19 and Flink 1.20. Please read these notes carefully if you are +planning to upgrade your Flink version to 1.20. + +### Checkpoints + +#### Unified File Merging Mechanism for Checkpoints + +##### [FLINK-32070](https://issues.apache.org/jira/browse/FLINK-32070) + +The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") +feature, which allows scattered small checkpoint files to be written into larger files, reducing the +number of file creations and file deletions and alleviating the pressure of file system metadata management +raised by the file flooding problem during checkpoints. The mechanism can be enabled by setting +`state.checkpoints.file-merging.enabled` to `true`. For more advanced options and principle behind +this feature, please refer to the document of `Checkpointing`. + +#### Reorganize State & Checkpointing & Recovery Configuration + +##### [FLINK-34255](https://issues.apache.org/jira/browse/FLINK-34255) + +Currently, all the options about state and checkpointing are reorganized and categorized by +prefixes as listed below: + +1. execution.checkpointing: all configurations associated with checkpointing and savepoint. +2. execution.state-recovery: all configurations pertinent to state recovery. +3. state.*: all configurations related to the state accessing. + 1. state.backend.*: specific options for individual state backends, such as RocksDB. + 2. state.changelog: configurations for the changelog, as outlined in FLIP-158, including the options for the "Durable Short-term Log" (DSTL). + 3. state.latency-track: configurations related to the latency tracking of state access. + +At the meantime, all the original options scattered everywhere are annotated as `@Deprecated`. + +#### Use common thread pools when transferring RocksDB state files + +##### [FLINK-35501](https://issues.apache.org/jira/browse/FLINK-35501) + +The semantics of `state.backend.rocksdb.checkpoint.transfer.thread.num` changed slightly: +If negative, the common (TM) IO thread pool is used (see `cluster.io-pool.size`) for up/downloading RocksDB files. + +#### Expose RocksDB bloom filter metrics + +##### [FLINK-34386](https://issues.apache.org/jira/browse/FLINK-34386) + +We expose some RocksDB bloom filter metrics to monitor the effectiveness of bloom filter optimization: + +`BLOOM_FILTER_USEFUL`: times bloom filter has avoided file reads. +`BLOOM_FILTER_FULL_POSITIVE`: times bloom FullFilter has not avoided the reads. +`BLOOM_FILTER_FULL_TRUE_POSITIVE`: times bloom FullFilter has not avoided the reads and data actually exist. + +#### Manually Compact Small SST Files + +##### [FLINK-26050](https://issues.apache.org/jira/browse/FLINK-26050) + +In some cases, the number of files produced by RocksDB state backend grows indefinitely.This might +cause task state info (TDD and checkpoint ACK) to exceed RPC message size and fail recovery/checkpoint +in addition to having lots of small files. + +In Flink 1.20, you can manually merge such files in the background using RocksDB API. + +### Runtime & Coordination + +#### Support Job Recovery from JobMaster Failures for Batch Jobs + +##### [FLINK-33892](https://issues.apache.org/jira/browse/FLINK-33892) + +In 1.20, we introduced a batch job recovery mechanism to enable batch jobs to recover as much progress as possible +after a JobMaster failover, avoiding the need to rerun tasks that have already been finished. + +More information about this feature and how to enable it could be found in: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/batch/recovery_from_job_master_failure/ + +#### Extend Curator config option for Zookeeper configuration + +##### [FLINK-33376](https://issues.apache.org/jira/browse/FLINK-33376) + +Adds support for the following curator parameters: +`high-availability.zookeeper.client.authorization` (corresponding curator parameter: `authorization`), +`high-availability.zookeeper.client.max-close-wait` (corresponding curator parameter: `maxCloseWaitMs`), +`high-availability.zookeeper.client.simulated-session-expiration-percent` (corresponding curator parameter: `simulatedSessionExpirationPercent`). + +#### More fine-grained timer processing + +##### [FLINK-20217](https://issues.apache.org/jira/browse/FLINK-20217) + +Firing timers can now be interrupted to speed up checkpointing. Timers that were interrupted by a checkpoint, +will be fired shortly after checkpoint completes. + +By default, this features is disabled. To enabled it please set `execution.checkpointing.unaligned.interruptible-timers.enabled` to `true`. +Currently supported only by all `TableStreamOperators` and `CepOperator`. + +#### Add numFiredTimers and numFiredTimersPerSecond metrics + +##### [FLINK-35065](https://issues.apache.org/jira/browse/FLINK-35065) + +Currently, there is no way of knowing how many timers are being fired by Flink, so it's impossible to distinguish, +even using code profiling, if operator is firing only a couple of heavy timers per second using ~100% of the CPU time, +vs firing thousands of timer per seconds. + +We added the following metrics to address this issue: + +- `numFiredTimers`: total number of fired timers per operator +- `numFiredTimersPerSecond`: per second rate of firing timers per operator + +#### Support EndOfStreamTrigger and isOutputOnlyAfterEndOfStream Operator Attribute to Optimize Task Deployment + +##### [FLINK-34371](https://issues.apache.org/jira/browse/FLINK-34371) + +For operators that only generates outputs after all inputs have been consumed, they are now optimized +to run in blocking mode, and the other operators in the same job will wait to start until these operators +have finished. Such operators include windowing with `GlobalWindows#createWithEndOfStreamTrigger`, +sorting, and etc. + +### SDK + +#### Support Full Partition Processing On Non-keyed DataStream + +##### [FLINK-34543](https://issues.apache.org/jira/browse/FLINK-34543) + +We have introduced some full window processing APIs, allowing collection and processing of all records +in each subtask: +- `mapPartition`: Processes all records using the MapPartitionFunction. +- `sortPartition`: Sort all records by field or key in the full partition window. +- `aggregate`: Aggregate all records in the full partition window. +- `reduce`: Reduce all records in the full partition window. + +### Table SQL / API + +#### Introduce a New Materialized Table for Simplifying Data Pipelines + +##### [FLINK-35187](https://issues.apache.org/jira/browse/FLINK-35187) + +We introduced the Materialized Table in Flink SQL, a new table type designed to simplify both +batch and stream data pipelines while providing a consistent development experience. + +By specifying data freshness and query at creation, the engine automatically derives the schema +and creates a data refresh pipeline to maintain the specified freshness. + +More information about this feature can be found here: [Materialized Table Overview](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/materialized-table/overview/) + +#### Introduce Catalog-related Syntax + +##### [FLINK-34914](https://issues.apache.org/jira/browse/FLINK-34914) + +As the application scenario of `Catalog` expands, which widely applied in services such as JDBC/Hive/Paimon, +`Catalog` plays an increasingly crucial role in Flink. + +FLIP-436 introduces the DQL syntax to obtain detailed metadata from existing catalogs, and the DDL syntax +to modify metadata such as properties or comment in the specified catalog. Review Comment: comment->comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org