Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23219 )

Change subject: IMPALA-13437 (part 2): Implement cost-based tuple cache 
placement
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/23219/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23219/5//COMMIT_MSG@11
PS5, Line 11: locations within a budget. First, it eliminates unprofitable 
locations
This will need a docs update. Can you file a ticket for documentation around 
tuple caching?


http://gerrit.cloudera.org:8080/#/c/23219/5/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/23219/5/common/thrift/ImpalaService.thrift@1043
PS5, Line 1043:   // cost_based: Pick locations that provide cost improvements 
while imposing a limit
nit: I'd swap the order of these to match their enum order.


http://gerrit.cloudera.org:8080/#/c/23219/5/fe/src/main/java/org/apache/impala/planner/TupleCacheCostBasedPolicy.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheCostBasedPolicy.java:

http://gerrit.cloudera.org:8080/#/c/23219/5/fe/src/main/java/org/apache/impala/planner/TupleCacheCostBasedPolicy.java@99
PS5, Line 99:       LOG.trace(node.getDisplayLabel() + " eliminated due to 
missing statistics");
So we won't use tuple caching on locations that are missing statistics? We 
should call that out in docs when we write them.

Could possibly log this at info level, but could get verbose. Maybe consider 
updating the warning about missing statistics in the profile when tuple caching 
is enabled?

Should all the false outcomes result an an ineligibility reason we can include 
in the explain output?


http://gerrit.cloudera.org:8080/#/c/23219/5/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_tuple_cache/tpcds-q01.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_tuple_cache/tpcds-q01.test:

http://gerrit.cloudera.org:8080/#/c/23219/5/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_tuple_cache/tpcds-q01.test@52
PS5, Line 52: |  tuple-ids=5,2,4 row-size=74B cardinality=10.16M cost=0
Should cost perhaps reflect "read processing cost"? I'm not sure how that would 
impact cost-based planning though.



--
To view, visit http://gerrit.cloudera.org:8080/23219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc6e7b95621a7937d892511dc879bf7c8da07cdc
Gerrit-Change-Number: 23219
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 07 Aug 2025 23:30:22 +0000
Gerrit-HasComments: Yes

Reply via email to