[
https://issues.apache.org/jira/browse/IMPALA-13437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021270#comment-18021270
]
ASF subversion and git services commented on IMPALA-13437:
----------------------------------------------------------
Commit 3181fe18006e392e0ce3f2f48fe285569ccfd148 in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3181fe180 ]
IMPALA-13437 (part 1): Compute processing cost before TupleCachePlanner
This is a preparatory change for cost-based placement for
TupleCacheNodes. It reorders planning so that the processing cost and
filtered cardinality are calculated before running the TupleCachePlanner.
This computes the processing cost when enable_tuple_cache=true.
It also displays the cost information in the explain plan output
when enable_tuple_cache=true. This does not impact the adjustment
of fragment parallelism, which continues to be controlled by the
compute_processing_cost option.
This uses the processing cost to calculate a cumulative processing
cost in the TupleCacheInfo. This is all of the processing cost below
this point including other fragments. This is an indicator of how
much processing a cache hit could avoid. This does not accumulate the
cost when merging the TupleCacheInfo due to a runtime filter, as that
cost is not actually being avoided. This also computes the estimated
serialized size for the TupleCacheNode based on the filtered
cardinality and the row size.
Testing:
- Ran a core job
Change-Id: If78f5d002b0e079eef1eece612f0d4fefde545c7
Reviewed-on: http://gerrit.cloudera.org:8080/23164
Reviewed-by: Yida Wu <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Michael Smith <[email protected]>
> Improve heuristics for placing the tuple cache nodes
> ----------------------------------------------------
>
> Key: IMPALA-13437
> URL: https://issues.apache.org/jira/browse/IMPALA-13437
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
>
> Improve placement of tuple cache nodes by considering:
> # Selectivity
> # Result Size
> # Operator cost
> # Data change frequency (maybe followup)
> # Etc
> This should avoid caching large results that don't have a major performance
> improvement.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]