Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/23930 )
Change subject: IMPALA-14716: Calcite Planner: Make condition estimates more similar to original planner ...................................................................... Patch Set 12: (12 comments) http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@10 PS12, Line 10: little bit more similar to what : is being done with the original Impala planner. > The comment here seems incomplete without a reason for doing this. Can you Done http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@13 PS12, Line 13: conditions > nit: did you mean unknown 'estimates' ? Not sure what unknown 'conditions' Fixed http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@14 PS12, Line 14: functions > nit: expressions rather than functions Done http://gerrit.cloudera.org:8080/#/c/23930/12//COMMIT_MSG@23 PS12, Line 23: XXXXX will be filled in after review > nit: Assuming this is created, pls add the jira id. Done http://gerrit.cloudera.org:8080/#/c/23930/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java: http://gerrit.cloudera.org:8080/#/c/23930/11/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@206 PS11, Line 206: IMPALA-XXXXX > I think you already filed a jira for this ? Best to add the number here. Done http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java File java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java: http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@138 PS12, Line 138: it will be ignored. > It's not ignored from what I can tell .. on line #73 it assigns the Expr.DE Changed the comment for clarity http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@254 PS12, Line 254: return Math.max(0.0, Math.min(1.0, selectivity)); > This would return 0 if the loop is not executed. The default should be 1.0 It's a disjunction, so the loop should always be entered. But I did add a preconditions check to ensure that the logic makes sense. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@273 PS12, Line 273: Collections.sort(selectivities); > nit: pls add a comment about why sorting the selectivities is needed as it Done http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/main/java/org/apache/impala/calcite/schema/FilterSelectivityEstimator.java@280 PS12, Line 280: return selectivity; > The corresponding Impala estimator applies a final bounds check between [0, Done http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java File java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java: http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@195 PS12, Line 195: //XXX: FIX THIS > Unclear why we need the hardcoded value. What can be done in the above calc Got rid of the hardcode and provided a description http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@334 PS12, Line 334: 10.0 > Where did this constant 10 come from ? if it's the NDV for this column, can Static constant was declard, just didn't use it :( Fixed now. http://gerrit.cloudera.org:8080/#/c/23930/12/java/calcite-planner/src/test/java/org/apache/impala/calcite/planner/TestCalciteStats.java@337 PS12, Line 337: distinctRows > 'distinctRows' is not used anymore ? Also, I am not sure why we care about I just have the general "distinctRows" test in all the unit tests just to test the right value is coming out. Should be 2 here since there are 2 values in the "in" clause. -- To view, visit http://gerrit.cloudera.org:8080/23930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b9a25259916504296dbd9a9cb9466be8fac8718 Gerrit-Change-Number: 23930 Gerrit-PatchSet: 12 Gerrit-Owner: Steve Carlin <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Comment-Date: Tue, 31 Mar 2026 21:14:16 +0000 Gerrit-HasComments: Yes
