Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/23628

to look at the new patch set (#7).

Change subject: Generate parallel data load with batch files
......................................................................

Generate parallel data load with batch files

Creates num_processes files for each phase of schema SQL dataload to
execute in parallel.

Analyzes SQL statements to create a dependency graph using networkx, and
batches statements by independent subgraphs so dependent statements are
always executed sequentially, and independent statements may be executed
concurrently.

Speeds up devdata functional-query load by ~30s, but now bound by TPC-DS
so no significant change overall:

    Loading TPC-H data OK (Took: 0 min 13 sec)
    Loading functional-query data OK (Took: 0 min 54 sec)
    Loading TPC-DS data OK (Took: 1 min 35 sec)

Change-Id: I9586504f6cb91f873f7ed978fda3df32e759ba90
---
M bin/load-data.py
M infra/python/deps/py3-requirements.txt
M testdata/bin/generate-schema-statements.py
3 files changed, 102 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/23628/7
--
To view, visit http://gerrit.cloudera.org:8080/23628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9586504f6cb91f873f7ed978fda3df32e759ba90
Gerrit-Change-Number: 23628
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>

Reply via email to