Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21955 )
Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality ...................................................................... Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/21955/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/21955/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test@1328 PS8, Line 1328: | row-size=16B cardinality=600 On second thought, I don't like this extreme reduction. numRows(tpch_parquet.customer) = 150000 ndv(c_custkey) = 25 cardinality(00:SCAN) = 6000 because of selectivity from predicate c_nationkey = 16 (150000 / 25). We can't accurately measure selectivity of this predicate during planning. 600 output cardinality is another reduction from having: count(*) < 150000 (default to 0.1 selectivity if Planner can't estimate better). If we do tuple-based reduction for this lone c_custkey expression, then we should do it for all expressions in case their scan cardinality is reduced by predicate too or not. This is ExecSummary from real run: ExecSummary: Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------------ F03:ROOT 1 1 209.867us 209.867us 4.01 MB 4.00 MB 06:EXCHANGE 1 1 273.126us 273.126us 4.08K 600 48.00 KB 21.09 KB UNPARTITIONED F02:EXCHANGE SENDER 1 1 622.052us 622.052us 25.21 KB 80.00 KB 03:AGGREGATE 1 1 0.000ns 0.000ns 4.08K 600 5.03 MB 10.00 MB FINALIZE 02:HASH JOIN 1 1 39.887ms 39.887ms 61.27K 91.47K 104.05 MB 80.83 MB INNER JOIN, PARTITIONED |--05:EXCHANGE 1 1 5.047ms 5.047ms 1.50M 1.50M 2.97 MB 5.75 MB HASH(o_custkey) | F01:EXCHANGE SENDER 2 2 25.312ms 28.458ms 68.27 KB 48.00 KB | 01:SCAN HDFS 2 2 5.138ms 7.039ms 1.50M 1.50M 10.38 MB 40.00 MB tpch_parquet.orders 04:EXCHANGE 1 1 31.329us 31.329us 4.08K 6.00K 96.00 KB 72.59 KB HASH(c_custkey) F00:EXCHANGE SENDER 1 1 135.283us 135.283us 33.63 KB 56.00 KB 00:SCAN HDFS 1 1 110.972ms 110.972ms 4.08K 6.00K 2.63 MB 48.00 MB tpch_parquet.customer -- To view, visit http://gerrit.cloudera.org:8080/21955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944 Gerrit-Change-Number: 21955 Gerrit-PatchSet: 8 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 24 Oct 2024 20:56:00 +0000 Gerrit-HasComments: Yes
