ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1264012455
##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
}
List<Object[]> container = rightBlock.getContainer();
// put all the rows into corresponding hash collections keyed by the key
selector function.
+ int initialHeuristicSize = 16;
for (Object[] row : container) {
- List<Object[]> hashCollection =
- _broadcastRightTable.computeIfAbsent(new
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+ ArrayList<Object[]> hashCollection =
Review Comment:
This loop could likely be called 100s of millions of times per second for
moderately high qps (10-100). Would adding an additional branch here offset the
benefits of running `ensureCapacity`?
Also do you know if the JDK can do loop unrolling here?
##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
}
List<Object[]> container = rightBlock.getContainer();
// put all the rows into corresponding hash collections keyed by the key
selector function.
+ int initialHeuristicSize = 16;
for (Object[] row : container) {
- List<Object[]> hashCollection =
- _broadcastRightTable.computeIfAbsent(new
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+ ArrayList<Object[]> hashCollection =
+ _broadcastRightTable.computeIfAbsent(new
Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>(16));
Review Comment:
Running `new Key()` would create a new object in each iteration. If this
loop is run millions of times per second, and assuming an object reference is
at least 8 bytes, then this could add MBs of data on the TLAB which would
likely lead to slow allocations, and also these would be unnecessary
short-lived objects.
Do you think we can change the FieldSelector to return a `Key` directly?
Or we could re-use the `Key` object and add a setter to it, so that we reuse
the reference and only update the inner reference with a new reference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]