---------- Forwarded message --------- 보낸사람: Dongwoo Kim <dongwoo7....@gmail.com> Date: 2023년 7월 31일 (월) 오후 11:36 Subject: Re: Flink sql client doesn't work with "partition by" clause To: liu ron <ron9....@gmail.com>
Hi, ron. Actually I'm not receiving any exception message when executing the *partition by* clause in the Flink SQL Client. The job does not fail, but it finishes quickly without executing the expected query job. I'm suspecting that the Flink SQL Client is not recognizing the partition by field(*`hour`*) properly. This is because when I input an obviously incorrect field(*hourr*) as the partition by field, the job behaves in the same manner - it does not fail, but also doesn't perform any operations(reading file ) and ends the query. *e.g) *This below query also doesn't fail but ends right after submission. CREATE TABLE source_table ( id STRING, status STRING, type STRING, hourr INT ) PARTITIONED BY (`hourr`) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://${our_data_path}month=202307/day=20230714', 'format' = 'parquet' ); SELECT hourr FROM source_table GROUP BY hourr; To provide better context I was using this sql-runner <https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java> to test the partition by clause and it worked as expected, but didn't work when I deployed the flink-session cluster by flink kubernetes operator and executed sql-client.sh in the jobmanger pod. Since as you said this is not an expected behavior, I'll investigate the source code. Thanks Best, dongwoo 2023년 7월 31일 (월) 오후 5:47, liu ron <ron9....@gmail.com>님이 작성: > Hi, dongwoo > > Can you give the exception message about SqlClient, it would be helpful to > find the root cause. In theory, it should work for both cases. > > Best, > Ron > > Dongwoo Kim <dongwoo7....@gmail.com> 于2023年7月28日周五 21:24写道: > >> Hello all, I've realized that the previous mail had some error which >> caused invisible text. So I'm resending the mail below. >> >> Hello all, I have found that the Flink sql client doesn't work with the >> *"partition >> by"* clause. >> Is this a bug? >> It's a bit weird since when I execute the same sql with >> *"tableEnv.executeSql(statement)"* code it works as expected. >> Has anyone tackled this kind of issue? >> I have tested in flink 1.16.1 version. >> >> Thanks in advance >> >> >> >> *- This below code only works with executeSql method in table api but not >> with sql client cli.* >> >> CREATE TABLE source_table >> ( >> id STRING, >> status STRING, >> type STRING, >> `hour` INT >> ) PARTITIONED BY (`hour`) WITH ( >> 'connector' = 'filesystem', >> 'path' = 'hdfs://${our_data_path}month=202307/day=20230714', >> 'format' = 'parquet' >> ); >> >> SELECT `hour` >> FROM source_table >> GROUP BY `hour`; >> >> >> *- This below query works both on the executeSql() method in table api >> and sql client query.* >> >> CREATE TABLE source_table_2 >> ( >> id STRING, >> status STRING, >> type STRING >> ) WITH ( >> 'connector' = 'filesystem', >> 'path' = 'hdfs://${out_data_path}/month=202307/day=20230714', >> 'format' = 'parquet' >> ); >> >> SELECT status >> FROM source_table_2 >> GROUP BY status; >> >> >> >> Best, >> dongwoo >> >> 2023년 7월 28일 (금) 오후 6:19, Dongwoo Kim <dongwoo7....@gmail.com>님이 작성: >> >>> Hello all, I have found that flink sql client doesn't work with >>> "partition by" clause. >>> Is this bug? It's bit weird since when I execute same sql with >>> tableEnv.executeSql(statement) code it works as expected. Has anyone >>> tackled this kind of issue? I have tested in flink 1.16.1 version. >>> Thanks in advance >>> >>> >>> - This below code only works with executeSql method in table api but >>> not with sql client cli. >>> >>> CREATE TABLE source_table >>> >>> ( >>> >>> id STRING, >>> >>> status STRING, >>> >>> type STRING, >>> >>> `hour` INT >>> >>> ) PARTITIONED BY (`hour`) WITH ( >>> >>> 'connector' = 'filesystem', >>> >>> 'path' = 'hdfs://${our_data_path}month=202307/day=20230714', >>> >>> 'format' = 'parquet' >>> >>> ); >>> >>> >>> SELECT `hour` >>> >>> FROM source_table >>> >>> GROUP BY `hour`; >>> >>> >>> >>> - This below query works both on executeSql method in table api and sql >>> client query. >>> >>> >>> CREATE TABLE source_table_2 >>> >>> ( >>> >>> id STRING, >>> >>> status STRING, >>> >>> type STRING >>> >>> ) WITH ( >>> >>> 'connector' = 'filesystem', >>> >>> 'path' = 'hdfs://${out_data_path}/month=202307/day=20230714', >>> >>> 'format' = 'parquet' >>> >>> ); >>> >>> >>> SELECT status >>> >>> FROM source_table_2 >>> >>> GROUP BY status; >>> >>> >>> >>> Best, >>> >>> dongwoo >>> >>> >>>