Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/23219 )
Change subject: IMPALA-13437 (part 2): Implement cost-based tuple cache placement ...................................................................... Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/23219/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23219/5//COMMIT_MSG@11 PS5, Line 11: locations within a budget. First, it eliminates unprofitable locations This will need a docs update. Can you file a ticket for documentation around tuple caching? http://gerrit.cloudera.org:8080/#/c/23219/5/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/23219/5/common/thrift/ImpalaService.thrift@1043 PS5, Line 1043: // cost_based: Pick locations that provide cost improvements while imposing a limit nit: I'd swap the order of these to match their enum order. http://gerrit.cloudera.org:8080/#/c/23219/5/fe/src/main/java/org/apache/impala/planner/TupleCacheCostBasedPolicy.java File fe/src/main/java/org/apache/impala/planner/TupleCacheCostBasedPolicy.java: http://gerrit.cloudera.org:8080/#/c/23219/5/fe/src/main/java/org/apache/impala/planner/TupleCacheCostBasedPolicy.java@99 PS5, Line 99: LOG.trace(node.getDisplayLabel() + " eliminated due to missing statistics"); So we won't use tuple caching on locations that are missing statistics? We should call that out in docs when we write them. Could possibly log this at info level, but could get verbose. Maybe consider updating the warning about missing statistics in the profile when tuple caching is enabled? Should all the false outcomes result an an ineligibility reason we can include in the explain output? http://gerrit.cloudera.org:8080/#/c/23219/5/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_tuple_cache/tpcds-q01.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_tuple_cache/tpcds-q01.test: http://gerrit.cloudera.org:8080/#/c/23219/5/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_tuple_cache/tpcds-q01.test@52 PS5, Line 52: | tuple-ids=5,2,4 row-size=74B cardinality=10.16M cost=0 Should cost perhaps reflect "read processing cost"? I'm not sure how that would impact cost-based planning though. -- To view, visit http://gerrit.cloudera.org:8080/23219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc6e7b95621a7937d892511dc879bf7c8da07cdc Gerrit-Change-Number: 23219 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 07 Aug 2025 23:30:22 +0000 Gerrit-HasComments: Yes
