Riza Suminto created IMPALA-13543:
-------------------------------------

             Summary: Make tpcds_partitioned eligible for 
single_node_perf_run.py
                 Key: IMPALA-13543
                 URL: https://issues.apache.org/jira/browse/IMPALA-13543
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
            Reporter: Riza Suminto
            Assignee: Riza Suminto


tpcds_partitioned dataset is a fully-partitioned version of tpcds dataset (the 
latter only partition store_sales table). It does not have the default text 
format database like tpcds dataset. Instead, it relies on pre-existence of text 
format tpcds database, which then INSERT OVERWRITE INTO tpcds_partitioned 
database equivalent. It does not have its own queries set, but instead 
symlinked to share testdata/workloads/tpcds/queries. It also have slightly 
different schema from tpcds dataset, namely column "c_last_review_date" in 
tpcds dataset is "c_last_review_date_sk" in tpcds_partitioned (TPC-DS v2.11.0, 
see related commit in 
[impala-tpcds-kit|https://github.com/cloudera/impala-tpcds-kit/commit/086d7113c8b4172247f83f60f4e274fe3326df11]).

Those reasons make tpcds_partitioned ineligible for perf-AB-test 
(single_node_perf_run.py), which require dataset loadable though 
bin/load-data.py in single execution. single_node_perf_run.py and related 
scripts must be modified a bit to accept tpcds_partitioned dataset for 
benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to