Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21483 )

Change subject: IMPALA-12800: Use HashMap for ExprSubstitutionMap lookups
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@428
PS2, Line 428: Objects.hash(super.localHash(), op_);
> I think this should also include isInferred_ and betweenExprId_
Why? Those aren't included in localEquals.

hashCode can allow duplicates, it distinguishes them with equals. But if two 
objects that would satisfy equals end up in different hash buckets, the hash 
table doesn't work.


http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
File fe/src/main/java/org/apache/impala/analysis/CaseExpr.java:

http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CaseExpr.java@204
PS2, Line 204: isDecode()
> Should this be decodeExpr_?
No, needs to mirror localEquals.


http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java:

http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@123
PS2, Line 123: Objects.hash(super.localHash(), op_);
> Include betweenSelectivity_?
No, needs to mirror localEquals.


http://gerrit.cloudera.org:8080/#/c/21483/1/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java
File fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java:

http://gerrit.cloudera.org:8080/#/c/21483/1/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@48
PS1, Line 48: HashMap
> Question: is 2 List + 1 HashMap still faster for expression substitution ra
Using ArrayLists the way we access them likely is faster than LinkedHashMap. 
Not as sure of this in Java vs C++, but I wouldn't expect it to be slower.


http://gerrit.cloudera.org:8080/#/c/21483/1/fe/src/main/java/org/apache/impala/analysis/ExprSubstitutionMap.java@204
PS1, Line 204: Preconditions.checkState(substitutions_.size() == lh
> Move this after L207? The Preconditions are probably light enough to evalua
Done


http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/OrderByElement.java
File fe/src/main/java/org/apache/impala/analysis/OrderByElement.java:

http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/OrderByElement.java@105
PS2, Line 105: public int hashCode() {
             :     return Objects.hash(getClass(), nullsFirstParam_, expr_, 
isAsc_);
             :   }
> Is expr_ included here because it it not child expression?
It's meant to capture all the objects required to be equal in "equals".


http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/21483/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@358
PS2, Line 358: if (label_ != null) return label_.toLowerCase().hashCode();
             :     return super.localHash();
> I think rawPath_ is important to hash for nested type column?
Hmm, I'm not sure. As described in 
https://docs.oracle.com/javase%2F8%2Fdocs%2Fapi%2F%2F/java/lang/Object.html#hashCode--
- If two objects are equal according to the equals(Object) method, then calling 
the hashCode method on each of the two objects must produce the same integer 
result.
- It is not required that if two objects are unequal according to the 
equals(java.lang.Object) method, then calling the hashCode method on each of 
the two objects must produce distinct integer results.

That suggests that using rawPath_ in hashCode and label_ in equals may have 
only circumstantially worked. I think a dummy SlotRef could have both desc_ and 
rawPath_ be null, while label_ isn't.

The equals implementation is weird here, in that if desc_ is not null only 
desc_ has to match, which is why the hashCode had to be written this way rather 
than a hash of all relevant objects.


http://gerrit.cloudera.org:8080/#/c/21483/2/testdata/workloads/functional-planner/queries/PlannerTest/many-expression.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/many-expression.test:

http://gerrit.cloudera.org:8080/#/c/21483/2/testdata/workloads/functional-planner/queries/PlannerTest/many-expression.test@4
PS2, Line 4:
> Replace tabs with spaces?
Sure. I just copied over the original file Joe uploaded.



--
To view, visit http://gerrit.cloudera.org:8080/21483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic538a82c69ee1dd76981fbacf95289c9d00ea9fe
Gerrit-Change-Number: 21483
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Thu, 06 Jun 2024 21:32:01 +0000
Gerrit-HasComments: Yes

Reply via email to