Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23626
to look at the new patch set (#12).
Change subject: IMPALA-9715: Load testdata with Impala
......................................................................
IMPALA-9715: Load testdata with Impala
Switches most data loading to Impala. Adds LOAD_HIVE for a few
operations that still don't work in Impala. Adds casts to a few types
where Impala failed the query due to loss of precision.
Moves uploading data to storage to load-data.py via
copy_workload_data_to_hdfs. Then uses Impala's LOAD DATA INPATH to
populate the tables. LOAD DATA INPATH moves the table files into place,
so multiple loads that used the same source file now load into one table
then others select from that table.
Moves most table data under testdata/data to simplify uploads.
functional-query data now live in testdata/data and testdata/target (for
generated data).
Removes passing scale_factor around in generate-schema-statements.py to
simplify function signatures as it always uses the CLI input.
This change reduces functional-query load time in my test from 24m20s to
19m44s, without significantly affecting parallel TPC-H and TPC-DS loads.
Change-Id: I43d681a89d49fde9562ea67fd250fad2edd308ae
---
M bin/create_testdata.sh
M bin/load-data.py
M bin/rat_exclude_files.txt
M testdata/bin/create-hbase.sh
M testdata/bin/generate-schema-statements.py
M testdata/common/text_delims_table.py
M testdata/common/widetable.py
R testdata/data/AllTypesError/0901.txt
R testdata/data/AllTypesError/0902.txt
R testdata/data/AllTypesError/0903.txt
R testdata/data/AllTypesErrorNoNulls/0901.txt
R testdata/data/AllTypesErrorNoNulls/0902.txt
R testdata/data/AllTypesErrorNoNulls/0903.txt
R testdata/data/ComplexTypesTbl/README
R testdata/data/ComplexTypesTbl/arrays.orc
R testdata/data/ComplexTypesTbl/arrays.parq
R testdata/data/ComplexTypesTbl/arrays_big.parq
R testdata/data/ComplexTypesTbl/nonnullable.avsc
R testdata/data/ComplexTypesTbl/nonnullable.json
R testdata/data/ComplexTypesTbl/nonnullable.orc
R testdata/data/ComplexTypesTbl/nonnullable.parq
R testdata/data/ComplexTypesTbl/nullable.avsc
R testdata/data/ComplexTypesTbl/nullable.json
R testdata/data/ComplexTypesTbl/nullable.orc
R testdata/data/ComplexTypesTbl/nullable.parq
R testdata/data/ComplexTypesTbl/structs.orc
R testdata/data/ComplexTypesTbl/structs.parq
R testdata/data/ComplexTypesTbl/structs_nested.orc
R testdata/data/ComplexTypesTbl/structs_nested.parq
R testdata/data/CustomerMultiBlock/README
R testdata/data/CustomerMultiBlock/customer_multiblock.parquet
R testdata/data/DimTbl/data.csv
R testdata/data/ImpalaDemoDataset/DEC_00_SF3_P077_with_ann_noheader.csv
R testdata/data/JoinTbl/data.csv
R testdata/data/LikeTbl/data.csv
R testdata/data/NullRows/data.csv
R testdata/data/NullTable/data.csv
R testdata/data/TblWithRaggedColumns/data.csv
R testdata/data/TinyIntTable/data.csv
R testdata/data/TinyTable/data.csv
R testdata/data/avro_null_char/000000_0
R testdata/data/bad_avro_snap/README
R testdata/data/bad_avro_snap/hive2_pre_gregorian_date.avro
R testdata/data/bad_avro_snap/hive3_pre_gregorian_date.avro
R testdata/data/bad_avro_snap/invalid_decimal_schema.avro
R testdata/data/bad_avro_snap/invalid_union.avro
R testdata/data/bad_avro_snap/negative_string_len.avro
R testdata/data/bad_avro_snap/out_of_range_date.avro
R testdata/data/bad_avro_snap/truncated_float.avro
R testdata/data/bad_avro_snap/truncated_string.avro
R testdata/data/bad_parquet_data/README
R testdata/data/bad_parquet_data/dict-encoded-negative-len.parq
R testdata/data/bad_parquet_data/dict-encoded-out-of-bounds.parq
R testdata/data/bad_parquet_data/illegal_decimals.parq
R testdata/data/bad_parquet_data/plain-encoded-negative-len.parq
R testdata/data/bad_parquet_data/plain-encoded-out-of-bounds.parq
R testdata/data/bad_seq_snap/bad_file
R testdata/data/bad_text_gzip/file_not_finished.gz
R testdata/data/empty_parquet_page_source_impala10186/data.csv
R testdata/data/hive_benchmark/grepTiny/part-00000
R testdata/data/hive_benchmark/htmlTiny/Rankings.dat
R testdata/data/hive_benchmark/htmlTiny/UserVisits.dat
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/tpcds/tpcds_schema_template.sql
M testdata/datasets/tpch/tpch_schema_template.sql
65 files changed, 303 insertions(+), 290 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/23626/12
--
To view, visit http://gerrit.cloudera.org:8080/23626
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I43d681a89d49fde9562ea67fd250fad2edd308ae
Gerrit-Change-Number: 23626
Gerrit-PatchSet: 12
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>