Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/22231 )
Change subject: IMPALA-13620: Refresh compute_table_stats.py script ...................................................................... Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh File testdata/bin/compute-table-stats.sh: http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh@58 PS8, Line 58: --table_names="store_sales,store_returns,web_sales,web_returns, Nit: this list of table names is repeated, could be moved into a variable to eliminate duplication. This suggestion is applicable below as well. http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh@63 PS8, Line 63: # Partitioned tables are computed separately above. Nit: At first glance, I missed that the table names are now in the exclude list in this command. Please consider adding to the comment wording such as "Compute stats in parallel on all remaining tables in the tpcds_partitioned_parquet_snap db." This suggestion is applicable below as well. http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh@92 PS8, Line 92: --table_names="alltypesagg,alltypesaggmultifilesnopart,alltypesaggnonulls, Nit: Consider matching the previous patterns where the second compute stats on the same db uses the `--exclude_table_names` parameter. http://gerrit.cloudera.org:8080/#/c/22231/8/tests/util/compute_table_stats.py File tests/util/compute_table_stats.py: http://gerrit.cloudera.org:8080/#/c/22231/8/tests/util/compute_table_stats.py@34 PS8, Line 34: DEFAULT_PARALLELISM = min(24, multiprocessing.cpu_count()) My dev machine has 16 CPUs. If it's critical to match the degree of parallelism to the cpu count, then we need to use a different method of dynamically obtaining the cpu count at script runtime. Another option would be to use the IMPALA_BUILD_THREADS environment variable. -- To view, visit http://gerrit.cloudera.org:8080/22231 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1ebf02f95b957e7dda3a30622b87e8fca3197699 Gerrit-Change-Number: 22231 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Tue, 31 Dec 2024 19:28:12 +0000 Gerrit-HasComments: Yes
