Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21304 )

Change subject: IMPALA-12910: Support running TPCH/TPCDS queries for JDBC tables
......................................................................


Patch Set 6:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/21304/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21304/6//COMMIT_MSG@9
PS6, Line 9: This patch adds script to create external JDBC tables for the 
dataset of
Please mention in the commit message how much additional disk space is needed 
for these tables in a development or test environment.


http://gerrit.cloudera.org:8080/#/c/21304/6//COMMIT_MSG@12
PS6, Line 12: It fixes the race condition for the caching of SQL DataSource 
objects by
suggestion: It fixes a race condition when caching SQL DataSource objects by 
using a new DataSourceObjectCache class...


http://gerrit.cloudera.org:8080/#/c/21304/6//COMMIT_MSG@15
PS6, Line 15: java.sql.Connection.close() is not effectively to remove a closed
suggestion: java.sql.Connection.close() fails to remove a closed connection 
from connection pool, which causes JDBC handler threads to wait for available 
connections from the connection pool for a long time.


http://gerrit.cloudera.org:8080/#/c/21304/6/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/21304/6/be/src/service/frontend.cc@99
PS6, Line 99: DEFINE_int32(dbcp_max_conn_pool_size, 8,
These seem like they may need to be RDBMS-specific at some point. Maybe we 
could plan table properties to override them needed? Although that would reduce 
its effectiveness as a limit on user behavior.


http://gerrit.cloudera.org:8080/#/c/21304/6/fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/21304/6/fe/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@214
PS6, Line 214:       Connection connToBeClosed = null;
This section is a little confusing, particularly what's going on when iterator_ 
is not null. Could you add a comment describing what's going on?


http://gerrit.cloudera.org:8080/#/c/21304/6/fe/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java
File 
fe/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java:

http://gerrit.cloudera.org:8080/#/c/21304/6/fe/src/main/java/org/apache/impala/extdatasource/jdbc/conf/JdbcStorageConfigManager.java@102
PS6, Line 102:       DatabaseType.valueOf(dbTypeName.toUpperCase());
Is this a fix for something? I don't think it's mentioned in the commit message.


http://gerrit.cloudera.org:8080/#/c/21304/6/testdata/datasets/tpcds/tpcds_jdbc_schema_template.sql
File testdata/datasets/tpcds/tpcds_jdbc_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/21304/6/testdata/datasets/tpcds/tpcds_jdbc_schema_template.sql@22
PS6, Line 22: CREATE EXTERNAL TABLE IF NOT EXISTS {jdbc_db_name}.store_sales (
nit: Could we generate this file? I guess the TPCDS layout isn't likely to 
change often, so probably not a big deal to hardcode it.


http://gerrit.cloudera.org:8080/#/c/21304/6/testdata/datasets/tpch/tpch_jdbc_schema_template.sql
File testdata/datasets/tpch/tpch_jdbc_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/21304/6/testdata/datasets/tpch/tpch_jdbc_schema_template.sql@22
PS6, Line 22: CREATE EXTERNAL TABLE IF NOT EXISTS {jdbc_db_name}.lineitem (
Same question about generating the file as tpcds.


http://gerrit.cloudera.org:8080/#/c/21304/6/tests/query_test/test_tpcds_queries.py
File tests/query_test/test_tpcds_queries.py:

http://gerrit.cloudera.org:8080/#/c/21304/6/tests/query_test/test_tpcds_queries.py@776
PS6, Line 776: @SkipIfDockerizedCluster.runs_slowly
Any idea why the Dockerized environment is particularly slow for this?


http://gerrit.cloudera.org:8080/#/c/21304/6/tests/query_test/test_tpcds_queries.py@800
PS6, Line 800:     self.run_test_case('tpcds-decimal_v2-q1', vector, 
use_db='tpcds_jdbc')
Could this use the parameterized run like in test_tpch_queries.py?



--
To view, visit http://gerrit.cloudera.org:8080/21304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44e8c1bb020e90559c7f22483a7ab7a151b8f48a
Gerrit-Change-Number: 21304
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: gaurav singh <[email protected]>
Gerrit-Comment-Date: Fri, 19 Apr 2024 22:51:33 +0000
Gerrit-HasComments: Yes

Reply via email to