[jira] [Created] (FLINK-34155) Recurring SqlExecutionException
Jeyhun Karimov created FLINK-34155: -- Summary: Recurring SqlExecutionException Key: FLINK-34155 URL: https://issues.apache.org/jira/browse/FLINK-34155 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.8.0 Reporter: Jeyhun Karimov Attachments: disk-full.log When analyzing very big maven log file in our CI system, I found out that there is a recurring {{{}SqlException (subset of the log file is attached){}}}: {{org.apache.flink.table.gateway.service.utils.SqlExecutionException: Only 'INSERT/CREATE TABLE AS' statement is allowed in Statement Set or use 'END' statement to submit Statement Set.}} which leads to: {{06:31:41,155 [flink-rest-server-netty-worker-thread-22] ERROR org.apache.flink.table.gateway.rest.handler.statement.FetchResultsHandler [] - Unhandled exception.}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34442) Support optimizations for pre-partitioned [external] data sources
Jeyhun Karimov created FLINK-34442: -- Summary: Support optimizations for pre-partitioned [external] data sources Key: FLINK-34442 URL: https://issues.apache.org/jira/browse/FLINK-34442 Project: Flink Issue Type: Improvement Components: Table SQL / API, Table SQL / Planner Affects Versions: 1.18.1 Reporter: Jeyhun Karimov There are some use-cases in which data sources are pre-partitioned: - Kafka broker is already partitioned w.r.t. some key - There are multiple Flink jobs that materialize their outputs and read them as input subsequently One of the main benefits is that we might avoid unnecessary shuffling. There is already an experimental feature in DataStream to support a subset of these [1]. We should support this for Flink Table/SQL as well. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34459) Results column names should match SELECT clause expression names
Jeyhun Karimov created FLINK-34459: -- Summary: Results column names should match SELECT clause expression names Key: FLINK-34459 URL: https://issues.apache.org/jira/browse/FLINK-34459 Project: Flink Issue Type: Improvement Components: Table SQL / Client Affects Versions: 1.18.1 Reporter: Jeyhun Karimov When printing {{SQL SELECT}} results, Flink will output generated expression name when the expression type is not {{column reference or alias or over.}} For example, select a, a + 1 from T would result in {code:java} ++-+-+ | op | a | EXPR$1 | ++-+-+ | +I | 1 | 2 | | +I | 1 | 2 | | +I | 1 | 2 | ++-+-+ {code} Instead of the generated {{EXPR$1}} it would be nice to have {{a + 1}} (which is the case in some other data processing systems like Spark). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34924) Support partition pushdown for join queries
Jeyhun Karimov created FLINK-34924: -- Summary: Support partition pushdown for join queries Key: FLINK-34924 URL: https://issues.apache.org/jira/browse/FLINK-34924 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Affects Versions: 1.19.0 Reporter: Jeyhun Karimov Consider the following tables: {code:java} create table partitionedTable1 ( a int, b int, c int) partitioned by (a, b) with ( ... ){code} {code:java} create table partitionedTable2 ( c int, d int, e int) partitioned by (d, e) with ( ... ) {code} And the following query: {code:java} select t1.b from partitionedTable1 t1 inner join partitionedTable2 t2 on t1.a = t2.d where t1.a > 1{code} Currently, the partition pushdown only considers the filter clause (t1.a > 1) and pushes the related partitions to the source operator. However, we should be able to also pushdown partitions because of join clause. Note that partitioned columns are the same as join fields. So, we can fetch existing partitions from each table, intersect them, and push their intersection to their source operators. -- This message was sent by Atlassian Jira (v8.20.10#820010)