[jira] [Created] (FLINK-34155) Recurring SqlExecutionException

2024-01-18 Thread Jeyhun Karimov (Jira)
Jeyhun Karimov created FLINK-34155:
--

 Summary: Recurring SqlExecutionException
 Key: FLINK-34155
 URL: https://issues.apache.org/jira/browse/FLINK-34155
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.8.0
Reporter: Jeyhun Karimov
 Attachments: disk-full.log

When analyzing very big maven log file in our CI system, I found out that there 
is a recurring {{{}SqlException (subset of the log file is attached){}}}:

 
{{org.apache.flink.table.gateway.service.utils.SqlExecutionException: Only 
'INSERT/CREATE TABLE AS' statement is allowed in Statement Set or use 'END' 
statement to submit Statement Set.}}
 
 

which leads to:

 
{{06:31:41,155 [flink-rest-server-netty-worker-thread-22] ERROR 
org.apache.flink.table.gateway.rest.handler.statement.FetchResultsHandler [] - 
Unhandled exception.}}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34442) Support optimizations for pre-partitioned [external] data sources

2024-02-13 Thread Jeyhun Karimov (Jira)
Jeyhun Karimov created FLINK-34442:
--

 Summary: Support optimizations for pre-partitioned [external] data 
sources
 Key: FLINK-34442
 URL: https://issues.apache.org/jira/browse/FLINK-34442
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API, Table SQL / Planner
Affects Versions: 1.18.1
Reporter: Jeyhun Karimov


There are some use-cases in which data sources are pre-partitioned:

- Kafka broker is already partitioned w.r.t. some key
- There are multiple Flink jobs  that materialize their outputs and read them 
as input subsequently

One of the main benefits is that we might avoid unnecessary shuffling. 
There is already an experimental feature in DataStream to support a subset of 
these [1].
We should support this for Flink Table/SQL as well. 

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34459) Results column names should match SELECT clause expression names

2024-02-18 Thread Jeyhun Karimov (Jira)
Jeyhun Karimov created FLINK-34459:
--

 Summary: Results column names should match SELECT clause 
expression names
 Key: FLINK-34459
 URL: https://issues.apache.org/jira/browse/FLINK-34459
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Affects Versions: 1.18.1
Reporter: Jeyhun Karimov


When printing {{SQL SELECT}} results, Flink will output generated expression 
name when the expression type is not {{column reference or alias or over.}}
For example, select a, a + 1 from T would result in 


{code:java}
++-+-+
| op |   a |  EXPR$1 |
++-+-+
| +I |   1 |   2 |
| +I |   1 |   2 |
| +I |   1 |   2 |
++-+-+
{code}

Instead of the generated {{EXPR$1}} it would be nice to have {{a + 1}} (which 
is the case in some other data processing systems like Spark).




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34924) Support partition pushdown for join queries

2024-03-23 Thread Jeyhun Karimov (Jira)
Jeyhun Karimov created FLINK-34924:
--

 Summary: Support partition pushdown for join queries
 Key: FLINK-34924
 URL: https://issues.apache.org/jira/browse/FLINK-34924
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.19.0
Reporter: Jeyhun Karimov


Consider the following tables: 
{code:java}
create table partitionedTable1 (
   a int, 
   b int, 
   c int)  
partitioned by (a, b) 
with ( ... ){code}
 
{code:java}
create table partitionedTable2 (
c int, 
d int, 
e int)  
 partitioned by (d, e) 
 with ( ... )  {code}
 

And the following query:
{code:java}
select t1.b 
from partitionedTable1 t1 inner join partitionedTable2 t2 
on t1.a = t2.d 
where t1.a > 1{code}
 

Currently, the partition pushdown only considers the filter clause (t1.a > 1) 
and pushes the related partitions to the source operator. 

However, we should be able to also pushdown partitions because of join clause. 
Note that partitioned columns are the same as join fields. So, we can fetch 
existing partitions from each table, intersect them, and push their 
intersection to their source operators. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)