WilliamShine opened a new issue #4816:
URL: https://github.com/apache/hudi/issues/4816
flink run this sql will get NPE:
CREATE TABLE `account`(
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`_ts_ms` bigint,
`_op` string,
`_hoodie_is_deleted` boolean,
`id` int,
`val` int,
`created_at` bigint,
`hh` string,
`dt` string,
PRIMARY KEY (`id`) NOT ENFORCED)
PARTITIONED BY (`dt`)
WITH (
'connector' = 'hudi',
'path' = 's3://de-hive-test/ods_test_debezium_nick.db/test_ods_monitor1',
'table.type' = 'MERGE_ON_READ'
);
CREATE TABLE if not exists `printTable` (
`_hoodie_commit_time` string,
`_hoodie_commit_seqno` string,
`_hoodie_record_key` string,
`_hoodie_partition_path` string,
`_hoodie_file_name` string,
`_ts_ms` bigint,
`_op` string,
`_hoodie_is_deleted` boolean,
`id` int,
`val` int,
`created_at` bigint,
`hh` string,
`dt` string
) WITH (
'connector' = 'print'
);
INSERT INTO printTable select * from account;
why MergeOnReadInputFormat.getRequiredPosWithCommitTime 'add
_hoodie_commit_time' for schema field?
if sql have 'add _hoodie_commit_time' colum,schema will be 'add
_hoodie_commit_time','add _hoodie_commit_time',`'hoodie_commit_seqno'......,
the columnReaders[i].readToVector(num, writableVectors[i]) in
ParquetColumnarRowSplitReader.nextBatch
pageReader.readPage in AbstractColumnReader.readToVector will read 'add
_hoodie_commit_time' column twice,
but in Parquet 1.11 ColumnChunkPageReadStore.readPage is DataPage
compressedPage = compressedPages.poll();
compressedPage will be null, final NEP will be happend.
Can you tall me why add 'add _hoodie_commit_time' column in default?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]