Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21483 )
Change subject: IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java: http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@428 PS2, Line 428: Objects.hash(super.localHash(), op_); > I think this should also include isInferred_ and betweenExprId_ Why? Those aren't included in localEquals. hashCode can allow duplicates, it distinguishes them with equals. But if two objects that would satisfy equals end up in different hash buckets, the hash table doesn't work. http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java: http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java@204 PS2, Line 204: isDecode() > Should this be decodeExpr_? No, needs to mirror localEquals. http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java: http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@123 PS2, Line 123: Objects.hash(super.localHash(), op_); > Include betweenSelectivity_? No, needs to mirror localEquals. http://gerrit.cloudera.org:8080/#/c/21483/1/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java: http://gerrit.cloudera.org:8080/#/c/21483/1/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@48 PS1, Line 48: HashMap > Question: is 2 List + 1 HashMap still faster for expression substitution ra Using ArrayLists the way we access them likely is faster than LinkedHashMap. Not as sure of this in Java vs C++, but I wouldn't expect it to be slower. http://gerrit.cloudera.org:8080/#/c/21483/1/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@204 PS1, Line 204: Preconditions.checkState(substitutions_.size() == lh > Move this after L207? The Preconditions are probably light enough to evalua Done http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/OrderByElement.java File fe/src/main/java/org/apache/impala/analysis/OrderByElement.java: http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/OrderByElement.java@105 PS2, Line 105: public int hashCode() { : return Objects.hash(getClass(), nullsFirstParam_, expr_, isAsc_); : } > Is expr_ included here because it it not child expression? It's meant to capture all the objects required to be equal in "equals". http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@358 PS2, Line 358: if (label_ != null) return label_.toLowerCase().hashCode(); : return super.localHash(); > I think rawPath_ is important to hash for nested type column? Hmm, I'm not sure. As described in https://docs.oracle.com/javase%2F8%2Fdocs%2Fapi%2F%2F/java/lang/Object.html#hashCode-- - If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. - It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. That suggests that using rawPath_ in hashCode and label_ in equals may have only circumstantially worked. I think a dummy SlotRef could have both desc_ and rawPath_ be null, while label_ isn't. The equals implementation is weird here, in that if desc_ is not null only desc_ has to match, which is why the hashCode had to be written this way rather than a hash of all relevant objects. http://gerrit.cloudera.org:8080/#/c/21483/2/testdata/workloads/functional-planner/queries/PlannerTest/many-expression.test File testdata/workloads/functional-planner/queries/PlannerTest/many-expression.test: http://gerrit.cloudera.org:8080/#/c/21483/2/testdata/workloads/functional-planner/queries/PlannerTest/many-expression.test@4 PS2, Line 4: > Replace tabs with spaces? Sure. I just copied over the original file Joe uploaded. -- To view, visit http://gerrit.cloudera.org:8080/21483 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe Gerrit-Change-Number: 21483 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Thu, 06 Jun 2024 21:32:01 +0000 Gerrit-HasComments: Yes
