Hello Riza Suminto, Jason Fehr, Joe McDonnell, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/23627
to look at the new patch set (#15).
Change subject: IMPALA-14553: Run schema eval concurrently
......................................................................
IMPALA-14553: Run schema eval concurrently
The majority of time spent in generate-schema-statements.py is in
eval_section for schema operations that shell out, often uploading files
via the hadoop CLI or generating data files. These operations should be
independent.
Runs eval_section at the beginning so we don't repeat it for each row in
test_vectors, and executes them in parallel via a ThreadPool. Defaults
to NUM_CONCURRENT_TESTS threads because the underlying operations have
some concurrency to them (such as HDFS mirroring writes).
Also collects existing tables into a set to optimize lookup.
Reduces generate-schema-statements by ~60%, from 2m30s to 1m. Confirmed
that contents of logs/data_loading/sql/functional are identical.
Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2
---
M testdata/bin/generate-schema-statements.py
1 file changed, 136 insertions(+), 49 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/23627/15
--
To view, visit http://gerrit.cloudera.org:8080/23627
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a78d05fd6a0005c83561978713237da2dde6af2
Gerrit-Change-Number: 23627
Gerrit-PatchSet: 15
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>