Re: [PR] [FLINK-35847] Add 1.20 release announcement [flink-web]

via GitHub Tue, 16 Jul 2024 05:00:10 -0700


rmetzger commented on code in PR #751:
URL: https://github.com/apache/flink-web/pull/751#discussion_r1679265573



##########
docs/content/posts/2024-07-15-release-1.20.0.md:
##########
@@ -0,0 +1,510 @@
+---
+authors:
+- reswqa:
+  name: "Weijie Guo"
+  twitter: "WeijieGuo12"
+- 1996fanrui:
+  name: "Rui Fan"
+  twitter: "1996fanrui"
+
+date: "2024-07-15T08:00:00Z"
+subtitle: ""
+title: Announcing the Release of Apache Flink 1.20
+aliases:
+- /news/2024/07/15/release-1.20.0.html
+---
+
+The Apache Flink PMC is pleased to announce the release of Apache Flink 
1.20.0. As usual, we are
+looking at a packed release with a wide variety of improvements and new 
features. Overall, 142
+people contributed to this release completing 13 FLIPs and 300+ issues. Thank 
you!
+
+Let's dive into the highlights.
+
+# Standing on the Eve of Apache Flink 2.0
+
+The Flink community expects a 1.19 → 1.20 → 2.0 release sequence, with a 
normal 4-5 months release cycle.
+We expect to deliver the 2.0 release by the end of 2024, as long as not 
compromising the quality.
+
+Start from Flink 1.19, the community has decided to officially deprecate 
multiple APIs that were approaching 
+end of life for a while. In 1.20, we further sorted through all relevant APIs 
that might need to be replaced
+or deprecated to clear the way for the 2.0 release:
+- Configuration Improvements: As Flink moves toward 2.0, we have revisited all 
runtime & table/sql 
+configurations and identified several improvements to enhance 
user-friendliness and maintainability
+- Deprecate the Legacy `SinkFunction` API: Since its introduction in Flink 
1.12, the Unified Sink API 
+has undergone extensive development and testing. Over multiple release cycles, 
the API has demonstrated
+stability and robustness, aligning with the criteria set forth in FLIP-197 for 
API stability graduation.
+So we promote the Unified Sink API v2 to `@Public` and deprecate the legacy 
`SinkFunction`.
+
+It has been seven years since the Flink community's last major release, and we 
have great expectations for Flink 2.0.
+We'll have a lot of killer features released in `2.x`, and some of them were 
released as MVP(minimum viable product) 
+in 1.20:
+- Introduce a New Materialized Table for Simplifying Data Pipelines: FLIP-435 
designed to simplify the development of
+data processing pipelines. With dynamic table with uniform SQL statements and 
freshness, users can define batch
+and streaming transformations to data in the same way, accelerate ETL pipeline 
development, and manage task scheduling
+automatically.
+- Unified File Merging Mechanism for Checkpoints: The unified file merging 
mechanism for checkpointing is introduced to
+Flink 1.20 as an MVP feature, which allows scattered small checkpoint files to 
be written into larger files, reducing
+the number of file creations and file deletions and alleviating the pressure 
of file system metadata management raised by
+the file flooding problem during checkpoints.
+
+# Flink SQL Improvements
+
+## Introduce a New Materialized Table for Simplifying Data Pipelines
+
+We have introduced the Materialized Table in Flink SQL, a new table type 
designed to simplify both batch and stream
+data pipelines while providing a consistent development experience.
+
+By specifying data freshness and query at creation, the engine automatically 
derives the schema and creates a data 
+refresh pipeline to maintain the specified freshness.
+
+Here is an example to create a materialized table that is constantly refreshed 
with a data freshness of `30` seconds.
+
+```sql
+CREATE MATERIALIZED TABLE continuous_users_shops
+PARTITIONED BY (ds)
+WITH (
+  'format' = 'debezium-json',
+  'sink.rolling-policy.rollover-interval' = '10s',
+  'sink.rolling-policy.check-interval' = '10s'
+)
+FRESHNESS = INTERVAL '30' SECOND
+AS SELECT
+  user_id,
+  ds,
+  SUM (payment_amount_cents) AS payed_buy_fee_sum,
+  SUM (1) AS PV
+FROM (
+  SELECT user_id, order_created_at AS ds, payment_amount_cents
+    FROM json_source
+  ) AS tmp
+GROUP BY user_id, ds;
+```
+
+**More Information**
+* [FLINK-35187](https://issues.apache.org/jira/browse/FLINK-35187)
+* 
[FLIP-435](https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines)
+* [Materialized Table 
Overview](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/materialized-table/overview/)
+
+
+## Introduce Catalog-Related Syntax
+
+As the application scenario of `Catalog` expands, which widely applied in 
services such as JDBC/Hive/Paimon,
+`Catalog` plays an increasingly crucial role in Flink.
+
+Now in Flink 1.20, you can use the `DQL` syntax to obtain detailed metadata 
from existing catalogs, and the
+`DDL` syntax to modify metadata such as properties or comment in the specified 
catalog.
+
+**More Information**
+* [FLINK-34914](https://issues.apache.org/jira/browse/FLINK-34914)
+* 
[FLIP-436](https://cwiki.apache.org/confluence/display/FLINK/FLIP-436%3A+Introduce+Catalog-related+Syntax)
+
+
+## Add DISTRIBUTED BY Clause
+
+Many SQL vendors expose the concepts of `Partitioning`, `Bucketing`, and 
`Clustering`. We propose to introduce
+the concept of `Bucketing` to Flink.
+
+Buckets enable load balancing in an external storage system by splitting data 
into disjoint subsets. It depends
+heavily on the semantics of the underlying connector. However, a user can 
influence the bucketing behavior by
+specifying the number of buckets, the bucketing algorithm, and (if the 
algorithm allows it) the columns which 
+are used for target bucket calculation. All bucketing components (i.e. bucket 
number, distribution algorithm, bucket key columns)
+are optional from a SQL syntax perspective.
+
+Take the following SQL statements as an example:
+
+```sql
+-- declares a hash function on a fixed number of 4 buckets (i.e. HASH(uid) % 4 
= target bucket).
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY HASH(uid) INTO 4 
BUCKETS;
+
+-- leaves the selection of an algorithm up to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid) INTO 4 
BUCKETS;
+
+-- leaves the number of buckets up  to the connector.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED BY (uid);
+
+-- only defines the number of buckets.
+CREATE TABLE MyTable (uid BIGINT, name STRING) DISTRIBUTED INTO 4 BUCKETS;
+```
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-33494)
+* 
[FLIP-376](https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause)
+
+# State & Checkpoint Improvements
+
+## Unified File Merging Mechanism for Checkpoints
+
+The unified file merging mechanism for checkpointing is introduced to Flink 
1.20 as an MVP ("minimum viable product") feature,
+which allows scattered small checkpoint files to be written into larger files, 
reducing the number of file creations
+and file deletions and alleviating the pressure of file system metadata 
management raised by the file flooding problem
+during checkpoints.
+
+The mechanism can be enabled by setting 
`execution.checkpointing.file-merging.enabled` to `true`. For more advanced 
options
+and principle behind this feature, please refer to the document of 
Checkpointing.
+
+**More Information**
+* [FLINK-33494](https://issues.apache.org/jira/browse/FLINK-32070)
+* 
[FLIP-306](https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/fault-tolerance/checkpointing/#unify-file-merging-mechanism-for-checkpoints-experimental)
+
+## Manually Compact Small SST Files
+
+In some cases, the number of files produced by RocksDB state backend grows 
indefinitely.This might cause task state
+info (TDD and checkpoint ACK) to exceed RPC message size and fail 
recovery/checkpoint in addition to having lots of small files.
+
+In Flink 1.20, you can manually merge such files in the background using 
RocksDB API.
+
+**More Information**
+* [FLINK-26050](https://issues.apache.org/jira/browse/FLINK-26050)
+
+## Reorganizing all State & Checkpointing Configuration Options
+
+In Flink 1.20, all the options about state and checkpointing are reorganized 
and categorized by prefixes as listed below:
+1. `execution.checkpointing`: all configurations associated with checkpointing 
and savepoint.
+2. `execution.state-recovery`: all configurations pertinent to state recovery.
+3. `state.*`: all configurations related to the state accessing.
+   a. `state.backend.*`: specific options for individual state backends, such 
as RocksDB.
+   b. `state.changelog`: configurations for the changelog, as outlined in 
FLIP-158, including the options for the "Durable Short-term Log" (DSTL).
+   c. `state.latency-track`: configurations related to the latency tracking of 
state access.
+
+At the meantime, all the original options scattered everywhere are annotated 
as `@Deprecated`. For the detailed list of options,
+please refer to the documents below.
+
+**More Information**
+* 
[Checkpointing](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#checkpointing)
+* 
[Recovery](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#recovery)
+* [State 
Backend](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-backends)
+* [State 
Changelog](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-changelog-options)
+* 
[Latency-track](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/#state-latency-tracking-options)
+
+# Batch Processing Improvements
+
+## Support Job Recovery from JobMaster Failures for Batch Jobs
+
+In Flink 1.20, we introduced a batch job recovery mechanism to enable batch 
jobs to recover as much progress as possible 
+after a `JobMaster` failover, avoiding the need to rerun tasks that have 
already been finished.
+
+**More Information**
+* [FLINK-33892](https://issues.apache.org/jira/browse/FLINK-33892)
+* 
[FLIP-383](https://cwiki.apache.org/confluence/display/FLINK/FLIP-383%3A+Support+Job+Recovery+from+JobMaster+Failures+for+Batch+Jobs)
+* 
[Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/batch/recovery_from_job_master_failure/)
+
+## Support Dynamic Parallelism Inference for HiveSource
+
+In Flink 1.20, we have introduced support for dynamic source parallelism 
inference in batch jobs for the `Hive` source connector. 
+This allows the connector to dynamically determine parallelism based on the 
actual partitions with dynamic partition pruning. 
+Additionally, we have introduced a new configuration option - 
`table.exec.hive.infer-source-parallelism.mode` to enable users
+to choose between static and dynamic inference modes for source parallelism.
+
+It should be noted that in Flink 1.20, the previous configuration option 
`table.exec.hive.infer-source-parallelism` has been 
+marked as deprecated, but it will continue to serve as a switch for automatic 
parallelism inference until it is fully phased out.
+
+**More Information**
+* [FLINK-35293](https://issues.apache.org/jira/browse/FLINK-35293)
+* 
[FLIP-445](https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource)
+
+# DataStream API Improvements
+
+The `DataSet` API has been already formally deprecated and will be removed in 
the Flink 2.0 version. 
+Flink users are recommended to migrate from the `DataSet` API to the 
`DataStream` API, `Table` API and `SQL` for their data 
+processing requirements.
+
+## Support Full Partition Processing on DataStream API
+
+But before 1.20, `DataStream` API does not directly support aggregations on 
non-keyed streams (subtask-scope aggregations). 
+In order to do so, we need to first assign the subtask id to the records, then 
turn the stream into a keyed stream. 
+This obviously incurs additional overhead, so we support `FullPartitionWindow` 
API in Flink 1.20.
+
+Suppose we want to count the number of records in each partition and output to 
a downstream operator. This can be done as follows:
+
+```java
+inputStream.fullWindowPartition()
+                .mapPartition(
+         new MapPartitionFunction<Record, Long>() {
+                    @Override
+                    public void mapPartition(
+                            Iterable<Record> values, Collector<Long> out)
+                            throws Exception {
+                        long counter = 0;
+                        for (Record value : values) {
+                            counter++;
+                        }
+                        out.collect(counter));
+                    }
+          })
+```
+
+**More Information**
+* [FLINK-34543](https://issues.apache.org/jira/browse/FLINK-34543)
+* 
[FLIP-380](https://cwiki.apache.org/confluence/display/FLINK/FLIP-380%3A+Support+Full+Partition+Processing+On+Non-keyed+DataStream)
+
+# Important Configuration Changes for Flink 2.0
+
+As Apache Flink progresses to version 2.0, several configuration options are 
being changed or deprecated to 
+improve user-friendliness and maintainability.
+
+## Using Proper Type for Configuration Options
+
+Some Configuration options like `client.heartbeat.interval` has been updated 
to the `Duration` type in a backward-compatible manner. 
+The full list is available in 
[FLINK-35359](https://issues.apache.org/jira/browse/FLINK-35359).
+
+The following configurations have been updated to the `Enum` type in a 
backward-compatible manner:
+- `taskmanager.network.compression.codec`
+- `table.optimizer.agg-phase-strategy`
+
+The following configurations have been updated to the `Int` type in a 
backward-compatible manner:
+- `yarn.application-attempts`
+
+**More Information**
+* [FLINK-35359](https://issues.apache.org/jira/browse/FLINK-35359)
+
+## Deprecate Multiple Configuration Options
+
+In preparation for the release of Flink 2.0 later this year, the community has 
decided to officially deprecate
+multiple configuration options that were approaching end of life for a while.
+
+The following configurations have been deprecated as we are phasing out the 
hash-based blocking shuffle:
+- `taskmanager.network.sort-shuffle.min-parallelism`
+- `taskmanager.network.blocking-shuffle.type`
+
+The following configurations have been deprecated as we are phasing out the 
legacy hybrid shuffle:
+- `taskmanager.network.hybrid-shuffle.spill-index-region-group-size`
+- `taskmanager.network.hybrid-shuffle.num-retained-in-memory-regions-max`
+- `taskmanager.network.hybrid-shuffle.enable-new-mode`
+
+The following configurations have been deprecated to simply the configuration 
of network buffers:
+- `taskmanager.network.memory.buffers-per-channel`
+- `taskmanager.network.memory.floating-buffers-per-gate`
+- `taskmanager.network.memory.max-buffers-per-channel`
+- `taskmanager.network.memory.max-overdraft-buffers-per-gate`
+- `taskmanager.network.memory.exclusive-buffers-request-timeout-ms` (Please 
use `taskmanager.network.memory.buffers-request-timeout` instead.)
+
+The configuration `taskmanager.network.batch-shuffle.compression.enabled` has 
been deprecated. Please set `taskmanager.network.compression.codec` to `NONE` 
to disable compression.
+
+The following Netty-related configurations are no longer recommended for use 
and have been deprecated:
+- `taskmanager.network.netty.num-arenas`
+- `taskmanager.network.netty.server.numThreads`
+- `taskmanager.network.netty.client.numThreads`
+- `taskmanager.network.netty.server.backlog`
+- `taskmanager.network.netty.sendReceiveBufferSize`
+- `taskmanager.network.netty.transport`
+
+The following configurations are unnecessary and have been deprecated:
+- `taskmanager.network.max-num-tcp-connections`
+- `fine-grained.shuffle-mode.all-blocking`
+
+These options were previously used for fine-tuning TPC testing but are no 
longer needed by the current Flink planner:
+- `table.exec.range-sort.enabled`
+- `table.optimizer.rows-per-local-agg`
+- `table.optimizer.join.null-filter-threshold`
+- `table.optimizer.semi-anti-join.build-distinct.ndv-ratio`
+- `table.optimizer.shuffle-by-partial-key-enabled`
+- `table.optimizer.smj.remove-sort-enabled`
+- `table.optimizer.cnf-nodes-limit`
+
+These options were introduced for the now-obsolete `FilterableTableSource` 
interface:
+- `table.optimizer.source.aggregate-pushdown-enabled`
+- `table.optimizer.source.predicate-pushdown-enabled`
+
+**More Information**
+* [FLINK-35461](https://issues.apache.org/jira/browse/FLINK-35461)
+* [FLINK-35473](https://issues.apache.org/jira/browse/FLINK-35473)
+
+## New and Updated Configuration Options
+
+### SQL Client Option
+`sql-client.display.max-column-width` has been replaced with 
`table.display.max-column-width`.
+
+### Batch Execution Options
+The following options have been moved from 
`org.apache.flink.table.planner.codegen.agg.batch.HashAggCodeGenerator`
+to `org.apache.flink.table.api.config` and promoted to `@PublicEvolving`.
+
+- `table.exec.local-hash-agg.adaptive.enabled`
+- `table.exec.local-hash-agg.adaptive.sampling-threshold`
+- `table.exec.local-hash-agg.adaptive.distinct-value-rate-threshold`
+
+### Lookup Hint Options
+The following options have been moved from 
`org.apache.flink.table.planner.hint.LookupJoinHintOptions`
+to `org.apache.flink.table.api.config.LookupJoinHintOptions` and promoted to 
`@PublicEvolving`.
+
+- `table`
+- `async`
+- `output-mode`
+- `capacity`
+- `timeout`
+- `retry-predicate`
+- `retry-strategy`
+- `fixed-delay`
+- `max-attempts`
+
+### Optimizer Options
+The following options have been moved from 
`org.apache.flink.table.planner.plan.optimize.RelNodeBlock`
+to `org.apache.flink.table.api.config.OptimizerConfigOptions` and promoted to 
`@PublicEvolving`
+
+- `table.optimizer.union-all-as-breakpoint-enabled`
+- `table.optimizer.reuse-optimize-block-with-digest-enabled`
+
+### Aggregate Optimizer Option
+`table.optimizer.incremental-agg-enabled` has been moved from 
`org.apache.flink.table.planner.plan.rules.physical.stream.IncrementalAggregateRule`
+to `org.apache.flink.table.api.config.OptimizerConfigOptions` and promoted to 
`@PublicEvolving`
+
+**More Information**
+* [FLINK-35461](https://issues.apache.org/jira/browse/FLINK-35461)
+* [FLINK-35473](https://issues.apache.org/jira/browse/FLINK-35473)
+
+# Upgrade Notes
+
+The Flink community tries to ensure that upgrades are as seamless as possible.
+However, certain changes may require users to make adjustments to certain parts
+of the program when upgrading to version 1.20. Please refer to the
+[release 
notes](https://nightlies.apache.org/flink/flink-docs-release-1.20/release-notes/flink-1.20/)
+for a comprehensive list of adjustments to make and issues to check during the
+upgrading process.
+
+# List of Contributors
+
+The Apache Flink community would like to express gratitude to all the
+contributors who made this release possible:

Review Comment:
   Will this be rendered as a list (a newline after each name?) In the past, 
this was just one block of names: 
https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/#list-of-contributors



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-35847] Add 1.20 release announcement [flink-web]

Reply via email to