I am using the newest 1.10 blink planner.
Perhaps it is because of the method i used to write the parquet file.
Receive kafka message, transform each message to a Java class Object, write the
Object to HDFS using StreamingFileSink, add the HDFS path as a partition of
the hive table
No matter what the order of the field description in hive ddl statement, the
hive client will work, as long as the field name is the same with Java Object
field name.
But flink sql client will not work.
DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
.forBulkFormat(new
Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
For example
RobotUploadData0101 has two fields: robotId int, robotTime long
CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and
CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)
is the same for hive client, but is different for flink-sql client
It is an expected behavior?
Thanks,
Lei
[email protected]
From: Jark Wu
Date: 2020-04-09 14:48
To: [email protected]; Jingsong Li; lirui
CC: user
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,
Are you using the newest 1.10 blink planner?
I'm not familiar with Hive and parquet, but I know @Jingsong Li and
@[email protected] are experts on this. Maybe they can help on this question.
Best,
Jark
On Tue, 7 Apr 2020 at 16:17, [email protected]
<[email protected]> wrote:
Hive table stored as parquet.
Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044
But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0
Any insight on this?
Thanks,
Lei
[email protected]