lincoln-lil commented on code in PR #721: URL: https://github.com/apache/flink-web/pull/721#discussion_r1512211263
########## docs/content/posts/2024-03-xx-release-1.19.0.md: ########## @@ -0,0 +1,466 @@ +--- +authors: +- LincolnLee: + name: "Lincoln Lee" + twitter: lincoln_86xy + +date: "2024-03-07T22:00:00Z" +subtitle: "" +title: Announcing the Release of Apache Flink 1.19 +aliases: +- /news/2024/03/xx/release-1.19.0.html +--- + +The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed +release with a wide variety of improvements and new features. Overall, 166 people contributed to this release completing +33 FLIPS and 600+ issues. Thank you! + +Let's dive into the highlights. + +# Towards a Streaming Lakehouse + +## Flink SQL Improvements + +### Support Setting Parallelism for Table/SQL Sources + +Now in Flink 1.19, you can set a custom parallelism for performance tuning via the `scan.parallelism` +option. The first available connector is DataGen(Kafka connector is on the way). Here is an example +using SQL Client: + +```sql +-- set parallelism within the ddl +CREATE TABLE Orders ( + order_number BIGINT, + price DECIMAL(32,2), + buyer ROW<first_name STRING, last_name STRING>, + order_time TIMESTAMP(3) +) WITH ( + 'connector' = 'datagen', + 'scan.parallelism' = '4' +); + +-- or set parallelism via dynamic table option +select * from Orders /*+ OPTIONS('scan.parallelism'='4') */; +``` + +**More Information** +* [Documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sourcessinks/#scan-table-source) +* [FLIP-367: Support Setting Parallelism for Table/SQL Sources](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150) + + +### SQL Gateway Support Configuring Java Options + +A new option `env.java.opts.sql-gateway` for specifying the Java options, so you can fine-tune the +memory settings, garbage collection behavior, and other relevant Java parameters for SQL Gateway. + +**More Information** +* [FLINK-33203](https://issues.apache.org/jira/browse/FLINK-33203) + + +### Support Configuring Different State TTLs using SQL Hint + +Starting from Flink 1.18, Table API and SQL users can set state time-to-live (TTL) individually for +stateful operators via the SQL compiled plan. In Flink 1.19, users have a more flexible way to +specify custom TTL values for regular joins and group aggregations directly within their queries by [utilizing the STATE_TTL hint](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#state-ttl-hints). +This improvement means that you no longer need to alter your compiled plan to set specific TTLs for +these frequently used operators. With the introduction of `STATE_TTL` hints, you can streamline your workflow and +dynamically adjust the TTL based on your operational requirements. + +Here is an example: +```sql +-- set state ttl for join +SELECT /*+ STATE_TTL('Orders'= '1d', 'Customers' = '20d') */ * +FROM Orders LEFT OUTER JOIN Customers + ON Orders.o_custkey = Customers.c_custkey; + +-- set state ttl for aggregation +SELECT /*+ STATE_TTL('o' = '1d') */ o_orderkey, SUM(o_totalprice) AS revenue +FROM Orders AS o +GROUP BY o_orderkey; +``` + +**More Information** +* [Documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#state-ttl-hints) +* [FLIP-373: Support Configuring Different State TTLs using SQL Hint](https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint) + + +### Support named parameters for functions and procedures + +Named parameters now can be used when calling a function or stored procedure. With named parameters, +users do not need to strictly specify the parameter position, just specify the parameter name and its +corresponding value. At the same time, if non-essential parameters are not specified, they will default to being filled with null. + +Here's an example of defining a function with one mandatory parameter and two optional parameters using named parameters: +```java +public static class NamedArgumentsTableFunction extends TableFunction<Object> { + + @FunctionHint( + output = @DataTypeHint("STRING"), + arguments = { + @ArgumentHint(name = "in1", isOptional = false, type = @DataTypeHint("STRING")), + @ArgumentHint(name = "in2", isOptional = true, type = @DataTypeHint("STRING")), + @ArgumentHint(name = "in3", isOptional = true, type = @DataTypeHint("STRING"))}) + public void eval(String arg1, String arg2, String arg3) { + collect(arg1 + ", " + arg2 + "," + arg3); + } + +} +``` +When calling the function in SQL, parameters can be specified by name, for example: +```sql +SELECT * FROM TABLE(myFunction(in1 => 'v1', in3 => 'v3', in2 => 'v2')) +``` +Also the optional parameters can be omitted: +```sql +SELECT * FROM TABLE(myFunction(in1 => 'v1')) +``` + +**More Information** +* [Documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#named-parameters) +* [FLIP-387: Support named parameters for functions and call procedures](https://cwiki.apache.org/confluence/display/FLINK/FLIP-387%3A+Support+named+parameters+for+functions+and+call+procedures) + +### Completes the Unimplemented Window TVF Aggregation Features + +* Supports SESSION Window TVF in Streaming Mode +Now users can use SESSION Window TVF in streaming mode. A simple example is as follows: +```sql +-- session window with partition keys +SELECT * FROM TABLE( + SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES)); + +-- apply aggregation on the session windowed table with partition keys +SELECT window_start, window_end, item, SUM(price) AS total_price +FROM TABLE( + SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES)) +GROUP BY item, window_start, window_end; +``` +* Supports Changelog Inputs For Window TVF Aggregation + Window aggregation operators (generated based on Window TVF Function) can now handle changelog + streams (e.g., CDC data sources, etc.) without issue. Users are recommended to migrate from legacy + window aggregation to the new syntax for more complete feature support. + +**More Information** +* [Documentation](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session) + + +### MiniBatch Optimization for Regular Joins + +##### [FLINK-34219](https://issues.apache.org/jira/browse/FLINK-34219) + +The record amplification is a pain point when performing cascading joins in Flink, now in Flink 1.19, +the new mini-batch optimization can be used for regular join to reduce intermediate result in such +cascading join scenarios. + +<div style="text-align: center;"> +<img src="/img/blog/2024-03-xx-release-1.19.0/minibatch_join.png" style="width:90%;margin:15px"> +</div> + +**More Information** +* [minibatch-regular-joins](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/tuning/#minibatch-regular-joins). +* [FLIP-415: Introduce a new join operator to support minibatch](https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch) + +# Key Improvements + +## Runtime & Coordination + +### Support dynamic source parallelism inference for batch jobs + +In Flink 1.19, we have supported dynamic source parallelism inference for batch jobs, which allows +source connectors to dynamically infer the parallelism based on the actual amount of data to consume. +This feature is a significant improvement over previous versions, which only assigned a fixed default +parallelism to source vertices. +Source connectors need to implement the inference interface to enable dynamic parallelism inference. +Currently, the FileSource connector has already been developed with this functionality in place. +Additionally, the configuration `execution.batch.adaptive.auto-parallelism.default-source-parallelism` +will be used as the upper bound of source parallelism inference. And now it will not default to 1. +Instead, if it is not set, the upper bound of allowed parallelism set via +`execution.batch.adaptive.auto-parallelism.max-parallelism` will be used. If that configuration is +also not set, the default parallelism set via `parallelism.default` or `StreamExecutionEnvironment#setParallelism()` +will be used instead. + +**More Information** +* [Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/elastic_scaling/#enable-dynamic-parallelism-inference-support-for-sources). +* [FLIP-379: Support dynamic source parallelism inference for batch jobs](https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs) + +### Support standard YAML for FLINK configuration + +Starting with Flink 1.19, Flink has officially introduced full support for the standard YAML 1.2 +syntax. The default configuration file has been changed to `config.yaml` and placed in the `conf/` +directory. Users should directly modify this file to configure Flink. +If users want to use the legacy configuration file `flink-conf.yaml`, users just need to copy this +file into the `conf/` directory. Once the legacy configuration file `flink-conf.yaml` is detected, +Flink will prioritize using it as the configuration file.And in the upcoming Flink 2.0, the +`flink-conf.yaml` configuration file will no longer work. + +**More Information** +* [flink-configuration-file](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#flink-configuration-file) +* [FLIP-366: Support standard YAML for FLINK configuration](https://cwiki.apache.org/confluence/display/FLINK/FLIP-366%3A+Support+standard+YAML+for+FLINK+configuration?src=contextnavpagetreemode) + +### Profiling JobManager/TaskManager on Flink Web Review Comment: @yuchen-ecnu I've added two imgs based on a testing job via latest rc1 package, please help to review the content, thanks!  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org