Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/23930 )
Change subject: IMPALA-14716: Calcite Planner: Make condition estimates more similar to original planner ...................................................................... Patch Set 12: (12 comments) http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@10 PS12, Line 10: little bit more similar to what : is being done with the original Impala planner. The comment here seems incomplete without a reason for doing this. Can you add the rationale here ? I presume it is to avoid regressions. In some selected cases, we should hopefully improve on the original estimates. http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@13 PS12, Line 13: conditions nit: did you mean unknown 'estimates' ? Not sure what unknown 'conditions' means here since the predicates are known patterns. http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@14 PS12, Line 14: functions nit: expressions rather than functions http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@23 PS12, Line 23: XXXXX will be filled in after review nit: Assuming this is created, pls add the jira id. http://gerrit.cloudera.org:8080/#/c/23930/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java: http://gerrit.cloudera.org:8080/#/c/23930/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@206 PS11, Line 206: IMPALA-XXXXX I think you already filed a jira for this ? Best to add the number here. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java: http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@138 PS12, Line 138: it will be ignored. It's not ignored from what I can tell .. on line #73 it assigns the Expr.DEFAULT_SELECTIVITY for such cases. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@254 PS12, Line 254: return Math.max(0.0, Math.min(1.0, selectivity)); This would return 0 if the loop is not executed. The default should be 1.0, not 0. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@273 PS12, Line 273: Collections.sort(selectivities); nit: pls add a comment about why sorting the selectivities is needed as it is not obvious unless one is aware of the fact that we want to apply most selective predicates first. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@280 PS12, Line 280: return selectivity; The corresponding Impala estimator applies a final bounds check between [0, 1]: Math.max(0.0, Math.min(1.0, result)); Best to do that here too. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java File java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java: http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@195 PS12, Line 195: //XXX: FIX THIS Unclear why we need the hardcoded value. What can be done in the above calculation to provide the expected value. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@334 PS12, Line 334: 10.0 Where did this constant 10 come from ? if it's the NDV for this column, can you declare it as a static constant in the class so it can be referenced elsewhere too. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@337 PS12, Line 337: distinctRows 'distinctRows' is not used anymore ? Also, I am not sure why we care about distinct row count. We just want the row count for the IN predicate. It's the same row count we would expect for an OR predicate. -- To view, visit http://gerrit.cloudera.org:8080/23930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b9a25259916504296dbd9a9cb9466be8fac8718 Gerrit-Change-Number: 23930 Gerrit-PatchSet: 12 Gerrit-Owner: Steve Carlin <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Tue, 31 Mar 2026 02:58:40 +0000 Gerrit-HasComments: Yes
