Hello Zoltan Borok-Nagy, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21657
to look at the new patch set (#3).
Change subject: IMPALA-13284: Loading test data on Apache Hive3
......................................................................
IMPALA-13284: Loading test data on Apache Hive3
There are some failures in loading test data on Apache Hive 3.1.3:
- STORED AS JSONFILE is not supported
- STORED BY ICEBERG is not supported. Similarly, STORED BY ICEBERG
STORED AS AVRO is not supported.
- Missing the jar of iceberg-hive-runtime in CLASSPATH of HMS and Tez
jobs.
- Creating table in Impala is not translated to EXTERNAL table in HMS
- Hive INSERT on insert-only tables failed in generating InsertEvents
(HIVE-20067).
This patch fixes the syntax issues by using old syntax of Apache Hive
3.1.3:
- Convert STORED AS JSONFILE to ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.JsonSerDe'
- Convert STORED BY ICEBERG to STORED BY
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
- Convert STORED BY ICEBERG STORED AS AVRO to the above one with
tblproperties('write.format.default'='avro')
Most of the conversion are done in generate-schema-statements.py. One
exception is in testdata/bin/load-dependent-tables.sql where we need to
generate a new file with the conversion when using it.
The missing jar of iceberg-hive-runtime is added into HIVE_AUX_JARS_PATH
in bin/impala-config.sh. Note that this is only needed by Apache Hive3
since CDP Hive3 has the jar of hive-iceberg-handler in its lib folder.
To fix the failure of InsertEvents, we add the patch of HIVE-20067 and
modify testdata/bin/patch_hive.sh to also recompile the submodule
standalone-metastore.
Modified some statements in
testdata/datasets/functional/functional_schema_template.sql to be more
reliable in retry.
Tests
- Verified the testdata can be loaded in ubuntu-20.04-from-scratch
Change-Id: I8f52c91602da8822b0f46f19dc4111c7187ce400
---
M bin/impala-config.sh
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-dependent-tables.sql
M testdata/bin/patch_hive.sh
M testdata/cluster/hive/README
A testdata/cluster/hive/patch3-HIVE-20067.diff
M testdata/datasets/functional/functional_schema_template.sql
8 files changed, 97 insertions(+), 10 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21657/3
--
To view, visit http://gerrit.cloudera.org:8080/21657
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8f52c91602da8822b0f46f19dc4111c7187ce400
Gerrit-Change-Number: 21657
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>