gortiz commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1265085660


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key 
selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new 
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =

Review Comment:
   I won't care that much about adding a conditional here given the complexity 
of 
[HashMap.computeIfAbsent](https://github.com/openjdk/jdk/blob/acf591e856ce4b43303b1578bd64a8c9ab0063ea/src/java.base/share/classes/java/util/HashMap.java#L1195).
   
   > Also do you know if the JDK can do loop unrolling here?
   
   I don't know, but I would guess it doesn't. What we do here is too complex. 
We are creating a new instance that copy some data from an array (in another 
loop) then we lookup for that new object in the map and in case the value is 
not there we call a lambda to create the value of that key. After that we just 
add the element to the list.
   
   We can try to apply some extra optimizations here. For example we can use a 
lightweight version of Key that does not copy the array of keys but get a 
reference to the column and the same `_columnIndices` we use right now and uses 
that to calculate hash and equals. Therefore we wouldn't need to create heavier 
instances for each row. The main problem with this approach is that the 
hashCode and equals will be a bit slower and we would need to keep a reference 
to the original row. But the latter can be further optimized



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to