ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1264012455


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key 
selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new 
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =

Review Comment:
   This loop could likely be called 100s of millions of times per second for 
moderately high qps (10-100). Would adding an additional branch here offset the 
benefits of running `ensureCapacity`?
   
   Also do you know if the JDK can do loop unrolling here?



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key 
selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new 
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =
+            _broadcastRightTable.computeIfAbsent(new 
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>(16));

Review Comment:
   Running `new Key()` would create a new object in each iteration. If this 
loop is run millions of times per second, and assuming an object reference is 
at least 8 bytes, then this could add MBs of data on the TLAB which would 
likely lead to slow allocations, and also these would be unnecessary 
short-lived objects.
   
   Do you think we can change the FieldSelector to return a `Key` directly?
   
   Or we could re-use the `Key` object and add a setter to it, so that we reuse 
the reference and only update the inner reference with a new reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to