Wenzhe Zhou has uploaded a new patch set (#8). (
http://gerrit.cloudera.org:8080/21304 )
Change subject: IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
......................................................................
IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
This patch adds script to create external JDBC tables for the dataset of
TPCH and TPCDS, and adds unit-tests to run TPCH and TPCDS queries for
external JDBC tables with Impala-Impala federation. Notes that JDBC
tables are mapping tables, they don't take additional disk spaces.
It fixes the race condition when caching of SQL DataSource objects by
using a new DataSourceObjectCache class, which checks reference count
before closing SQL DataSource.
Adds a new query-option 'clean_dbcp_ds_cache' with default value as
true. When it's set as false, SQL DataSource object will not be closed
when its reference count equals 0 and will be kept in cache until
the SQL DataSource is idle for more than 5 minutes.
java.sql.Connection.close() fails to remove a closed connection from
connection pool sometimes, which causes JDBC working threads to wait
for available connections from the connection pool for a long time.
The work around is to call BasicDataSource.invalidateConnection() API
to close a connection.
Two flag variables are added for DBCP configuration properties
'maxTotal' and 'maxWaitMillis'. Notes that 'maxActive' and 'maxWait'
properties are renamed to 'maxTotal' and 'maxWaitMillis' respectively
in apache.commons.dbcp v2.
Also fixes a small bug for database type since the type strings
specified by user could be lower case or mix of upper/lower cases, but
the code compares the types with upper case string.
testdata/bin/create-tpc-jdbc-tables.py supports to create JDBC tables
for Impala-Impala, Postgres and MySQL.
Following sample commands creates TPCDS JDBC tables for Impala-Impala
federation with remote coordinator running at 10.19.10.86, and Postgres
server running at 10.19.10.86:
${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=IMPALA --database_host=10.19.10.86 --clean
${IMPALA_HOME}/testdata/bin/create-tpc-jdbc-tables.py \
--jdbc_db_name=tpcds_jdbc --workload=tpcds \
--database_type=POSTGRES --database_host=10.19.10.86 \
--database_name=tpcds --clean
Remaining Issues:
- tpcds-decimal_v2-q80a failed with returned rows not matching expected
results for some decimal values. This will be fixed in IMPALA-13018.
Testing:
- Passed core-test.
- Manually verified that only one SQL DataSource object was created for
test_tpcds_queries.py::TestTpcdsQueryForJdbcTables since query option
'clean_dbcp_ds_cache' was set as false, and the SQL DataSource object
was closed by cleanup thread.
Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
---
M be/src/exec/data-source-scan-node.cc
M be/src/service/frontend.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/ExternalDataSource.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
M
fe/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
A
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DataSourceObjectCache.java
M
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/DatabaseAccessor.java
M
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
M
fe/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M testdata/bin/create-load-data.sh
A testdata/bin/create-tpc-jdbc-tables.py
A testdata/datasets/tpcds/tpcds_jdbc_schema_template.sql
A testdata/datasets/tpch/tpch_jdbc_schema_template.sql
M tests/query_test/test_tpcds_queries.py
M tests/query_test/test_tpch_queries.py
22 files changed, 1,873 insertions(+), 99 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/21304/8
--
To view, visit http://gerrit.cloudera.org:8080/21304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Gerrit-Change-Number: 21304
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: gaurav singh <[email protected]>