Fwd: Flink sql client doesn't work with "partition by" clause

Dongwoo Kim Mon, 31 Jul 2023 07:38:26 -0700

---------- Forwarded message ---------
보낸사람: Dongwoo Kim <dongwoo7....@gmail.com>
Date: 2023년 7월 31일 (월) 오후 11:36
Subject: Re: Flink sql client doesn't work with "partition by" clause
To: liu ron <ron9....@gmail.com>



Hi, ron.

Actually I'm not receiving any exception message when executing the *partition
by* clause in the Flink SQL Client.

The job does not fail, but it finishes quickly without executing the
expected query job.

I'm suspecting that the Flink SQL Client is not recognizing the partition
by field(*`hour`*) properly.

This is because when I input an obviously incorrect field(*hourr*) as the
partition by field, the job behaves in the same manner - it does not fail,
but also doesn't perform any operations(reading file ) and ends the query.

*e.g) *This below query also doesn't fail but ends right after submission.

CREATE TABLE source_table
(
    id     STRING,
    status STRING,
    type   STRING,
    hourr INT
) PARTITIONED BY (`hourr`) WITH (
      'connector' = 'filesystem',
      'path' = 'hdfs://${our_data_path}month=202307/day=20230714',
      'format' = 'parquet'
      );

SELECT hourr
FROM source_table
GROUP BY hourr;


To provide better context
I was using this sql-runner
<https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java>
to test the partition by clause and it worked as expected, but didn't work
when I deployed the flink-session cluster by flink kubernetes operator and
executed sql-client.sh in the jobmanger pod.

Since as you said this is not an expected behavior, I'll investigate the
source code.

Thanks


Best,
dongwoo



2023년 7월 31일 (월) 오후 5:47, liu ron <ron9....@gmail.com>님이 작성:

> Hi, dongwoo
>
> Can you give the exception message about SqlClient, it would be helpful to
> find the root cause. In theory, it should work for both cases.
>
> Best,
> Ron
>
> Dongwoo Kim <dongwoo7....@gmail.com> 于2023年7月28日周五 21:24写道：
>
>> Hello all, I've realized that the previous mail had some error which
>> caused invisible text. So I'm resending the mail below.
>>
>> Hello all, I have found that the Flink sql client doesn't work with the 
>> *"partition
>> by"* clause.
>> Is this a bug?
>> It's a bit weird since when I execute the same sql with
>> *"tableEnv.executeSql(statement)"* code it works as expected.
>> Has anyone tackled this kind of issue?
>> I have tested in flink 1.16.1 version.
>>
>> Thanks in advance
>>
>>
>>
>> *- This below code only works with executeSql method in table api but not
>> with sql client cli.*
>>
>> CREATE TABLE source_table
>> (
>>     id     STRING,
>>     status STRING,
>>     type   STRING,
>>     `hour` INT
>> ) PARTITIONED BY (`hour`) WITH (
>>       'connector' = 'filesystem',
>>       'path' = 'hdfs://${our_data_path}month=202307/day=20230714',
>>       'format' = 'parquet'
>>       );
>>
>> SELECT `hour`
>> FROM source_table
>> GROUP BY `hour`;
>>
>>
>> *- This below query works both on the executeSql() method in table api
>> and sql client query.*
>>
>> CREATE TABLE source_table_2
>> (
>>     id       STRING,
>>     status   STRING,
>>     type     STRING
>> ) WITH (
>>       'connector' = 'filesystem',
>>       'path' = 'hdfs://${out_data_path}/month=202307/day=20230714',
>>       'format' = 'parquet'
>>       );
>>
>> SELECT status
>> FROM source_table_2
>> GROUP BY status;
>>
>>
>>
>> Best,
>> dongwoo
>>
>> 2023년 7월 28일 (금) 오후 6:19, Dongwoo Kim <dongwoo7....@gmail.com>님이 작성:
>>
>>> Hello all, I have found that flink sql client doesn't work with
>>> "partition by" clause.
>>> Is this bug? It's bit weird since when I execute same sql with
>>> tableEnv.executeSql(statement) code it works as expected. Has anyone
>>> tackled this kind of issue? I have tested in flink 1.16.1 version.
>>> Thanks in advance
>>>
>>>
>>> - This below code only works with executeSql method in table api but
>>> not with sql client cli.
>>>
>>> CREATE TABLE source_table
>>>
>>> (
>>>
>>>     id       STRING,
>>>
>>>     status   STRING,
>>>
>>>     type     STRING,
>>>
>>>     `hour`        INT
>>>
>>> ) PARTITIONED BY (`hour`) WITH (
>>>
>>>       'connector' = 'filesystem',
>>>
>>>       'path' = 'hdfs://${our_data_path}month=202307/day=20230714',
>>>
>>>       'format' = 'parquet'
>>>
>>>       );
>>>
>>>
>>> SELECT `hour`
>>>
>>> FROM source_table
>>>
>>> GROUP BY `hour`;
>>>
>>>
>>>
>>> - This below query works both on executeSql method in table api and sql
>>> client query.
>>>
>>>
>>> CREATE TABLE source_table_2
>>>
>>> (
>>>
>>>     id       STRING,
>>>
>>>     status   STRING,
>>>
>>>     type     STRING
>>>
>>> ) WITH (
>>>
>>>       'connector' = 'filesystem',
>>>
>>>       'path' = 'hdfs://${out_data_path}/month=202307/day=20230714',
>>>
>>>       'format' = 'parquet'
>>>
>>>       );
>>>
>>>
>>> SELECT status
>>>
>>> FROM source_table_2
>>>
>>> GROUP BY status;
>>>
>>>
>>>
>>> Best,
>>>
>>> dongwoo
>>>
>>>
>>>

Fwd: Flink sql client doesn't work with "partition by" clause

Reply via email to