[ 
https://issues.apache.org/jira/browse/IMPALA-13045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914372#comment-17914372
 ] 

ASF subversion and git services commented on IMPALA-13045:
----------------------------------------------------------

Commit c298c542621cb58ffe0772bf29ebdf7316cb77d1 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c298c5426 ]

IMPALA-13644: Generalize and move getPerInstanceNdvForCpuCosting

getPerInstanceNdvForCpuCosting is a method to estimate the number of
distinct values of exprs per fragment instance when accounting for the
likelihood of duplicate keys across fragment instances. It borrows the
probabilistic model described in IMPALA-2945. This method is exclusively
used by AggregationNode only.

getPerInstanceNdvForCpuCosting run the probabilistic formula
individually for each grouping expression and then multiply it together.
That match with how we estimate group NDV in the past where we simply do
NDV multiplication of each grouping expression.

Recently, we adds tuple-based analysis to lower cardinality estimate for
all kind of aggregation node (IMPALA-13045, IMPALA-13465, IMPALA-13086).
All of the bounding happens in AggregationNode.computeStats(), where we
call estimateNumGroups() function that returns globalNdv estimate for
specific aggregation class.

To take advantage from that more precise globalNdv, this patch replace
getPerInstanceNdvForCpuCosting() with estimatePreaggCardinality() that
apply the probabilistic formula over this single globalNdv number rather
than the old way where it often return an overestimated number from NDV
multiplication method. Its use is still limited only to calculate
ProcessingCost. Using it for preagg output cardinality will be done by
IMPALA-2945.

estimatePreaggCardinality is skipped if data partition of input is a
subset of grouping expression.

Testing:
- Run and pass PlannerTest that set COMPUTE_PROCESSING_COST=True.
  ProcessingCost changes, but all cardinality number stays.
- Add CardinalityTest#testEstimatePreaggCardinality.
- Update test_executor_groups.py. Enable v2 profile as well for easier
  runtime profile debugging.

Change-Id: Iddf75833981558fe0188ea7475b8d996d66983c1
Reviewed-on: http://gerrit.cloudera.org:8080/22320
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Fix intermittent failure in TestQueryLive.test_local_catalog
> ------------------------------------------------------------
>
>                 Key: IMPALA-13045
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13045
>             Project: IMPALA
>          Issue Type: Task
>            Reporter: Michael Smith
>            Assignee: Michael Smith
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>
> IMPALA-13005 introduced {{drop table sys.impala_query_live}}. In some test 
> environments (notably testing with Ozone), recreating that table in the 
> following test - test_local_catalog - does not occur before running the test 
> case portion that attempts to query that table.
> Update the test to wait for the table to be available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to