[
https://issues.apache.org/jira/browse/HIVE-29476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060706#comment-18060706
]
Stamatis Zampetakis commented on HIVE-29476:
--------------------------------------------
The idea for this set of tests came while I was thinking how to validate the
changes by HIVE-26830. [~thomasrebele] feedback is very welcomed.
> Add tests for TPC-DS 30TB metastore content
> -------------------------------------------
>
> Key: HIVE-29476
> URL: https://issues.apache.org/jira/browse/HIVE-29476
> Project: Hive
> Issue Type: Test
> Components: Test
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
>
> The [TPC-DS 30TB plan regression
> suite|https://github.com/apache/hive/blob/2fa85ab5f6683e16125b30b63b4189b95b098b5a/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java]
> is based on a pre-built database dump that is loaded via dockerized
> [Postgres
> database|https://github.com/apache/hive/blob/2fa85ab5f6683e16125b30b63b4189b95b098b5a/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/PostgresTPCDS.java].
> The content of the dump is not validated anywhere and we can only verify
> what's inside either by manually inspecting the dump or inferring implicit
> conclusions from the query plans. The dump has been updated a few times
> already and there is also an imminent update that is gonna happen in
> HIVE-26830. The creation of the dump is a manual process so it would be
> helpful to have a basic set of tests that verify the state of the metastore
> and how the dump evolves.
> Interesting information that we would like to capture includes:
> * table and column data types
> * constraints (FK, NOT NULL)
> * basic table stats such as num_rows, numPartitions, etc.
> * basic column stats such as min, max, NDV, num_nulls, etc.
> The above can be captured by adding DESCRIBE FORMATTED qtests for each TPC-DS
> table and column. As an added bonus this will increase the coverage for
> DESCRIBE statements.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)