[ 
https://issues.apache.org/jira/browse/HIVE-29476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060706#comment-18060706
 ] 

Stamatis Zampetakis commented on HIVE-29476:
--------------------------------------------

The idea for this set of tests came while I was thinking how to validate the 
changes by HIVE-26830. [~thomasrebele] feedback is very welcomed.

> Add tests for TPC-DS 30TB metastore content
> -------------------------------------------
>
>                 Key: HIVE-29476
>                 URL: https://issues.apache.org/jira/browse/HIVE-29476
>             Project: Hive
>          Issue Type: Test
>          Components: Test
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>
> The [TPC-DS 30TB plan regression 
> suite|https://github.com/apache/hive/blob/2fa85ab5f6683e16125b30b63b4189b95b098b5a/itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezTPCDS30TBPerfCliDriver.java]
>  is based on a pre-built database dump that is loaded via dockerized 
> [Postgres 
> database|https://github.com/apache/hive/blob/2fa85ab5f6683e16125b30b63b4189b95b098b5a/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/PostgresTPCDS.java].
>  The content of the dump is not validated anywhere and we can only verify 
> what's inside either by manually inspecting the dump or inferring implicit 
> conclusions from the query plans. The dump has been updated a few times 
> already and there is also an imminent update that is gonna happen in 
> HIVE-26830. The creation of the dump is a manual process so it would be 
> helpful to have a basic set of tests that verify the state of the metastore 
> and how the dump evolves.
> Interesting information that we would like to capture includes:
>  * table and column data types
>  * constraints (FK, NOT NULL)
>  * basic table stats such as num_rows, numPartitions, etc.
>  * basic column stats such as min, max, NDV, num_nulls, etc.
> The above can be captured by adding DESCRIBE FORMATTED qtests for each TPC-DS 
> table and column. As an added bonus this will increase the coverage for 
> DESCRIBE statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to