Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

via GitHub Tue, 28 Nov 2023 09:37:52 -0800


zhaih commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1408142129



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##########
@@ -31,18 +33,21 @@
  *
  * @lucene.internal
  */
-public class NeighborArray {
+public class NeighborArray implements Accountable {
+  private static final int INITIAL_CAPACITY = 10;

Review Comment:
   It's ok to grow because it's under lock as @msokolov pointed out. However I 
think we previously avoid growing because we don't want to spend extra time on 
resizing and copy-paste?
   Could you run the knn benchmark to check the performance is ok? (Maybe 
better the multithread merge version to check btw the resizing is safe



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -275,21 +275,15 @@ private void generateLevelToNodes() {
 
   @Override
   public long ramBytesUsed() {
-    long neighborArrayBytes0 =
-        (long) nsize0 * (Integer.BYTES + Float.BYTES)
-            + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2L
-            + RamUsageEstimator.NUM_BYTES_OBJECT_REF * 2L
-            + Integer.BYTES * 3;
-    long neighborArrayBytes =
-        (long) nsize * (Integer.BYTES + Float.BYTES)
-            + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2L
-            + RamUsageEstimator.NUM_BYTES_OBJECT_REF * 2L
-            + Integer.BYTES * 3;
     long total = 0;
-    total +=
-        size() * (neighborArrayBytes0 + 
RamUsageEstimator.NUM_BYTES_ARRAY_HEADER)
-            + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER; // for graph and level 
0;
-    total += nonZeroLevelSize.get() * neighborArrayBytes; // for non-zero level
+    for (NeighborArray[] neighborArraysPerNode : graph) {
+      if (neighborArraysPerNode != null) {
+        for (NeighborArray neighborArrayPerNodeAndLevel : 
neighborArraysPerNode) {
+          total += neighborArrayPerNodeAndLevel.ramBytesUsed();

Review Comment:
   Hmmm this can make this method potentially a bit slow, it might be ok since 
normally nobody will call it in any critical code path I guess, but it would 
still be better if we can have an estimation that runs faster rather than the 
every accurate account?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

Reply via email to