Re: [PR] OAK-10966 - Indexing job: create optimized version of PersistedLinkedList [jackrabbit-oak]

via GitHub Mon, 29 Jul 2024 00:07:42 -0700


thomasmueller commented on code in PR #1595:
URL: https://github.com/apache/jackrabbit-oak/pull/1595#discussion_r1694684711



##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/linkedList/PersistedLinkedListV2.java:
##########
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.jackrabbit.guava.common.base.Preconditions;
+import org.apache.jackrabbit.oak.commons.IOUtils;
+import org.apache.jackrabbit.oak.index.indexer.document.NodeStateEntry;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.NodeStateEntryReader;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.NodeStateEntryWriter;
+import org.h2.mvstore.MVMap;
+import org.h2.mvstore.MVStore;
+import org.h2.mvstore.MVStoreTool;
+import org.jetbrains.annotations.NotNull;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Iterator;
+
+/**
+ * A persistent linked list that internally uses the MVStore. This list keeps 
an in-memory cache, writing to the
+ * persistent store the nodes only when the cache is full. The in-memory cache 
is limited by two parameters:
+ *
+ * <ul>
+ *     <li>cacheSize: the maximum number of elements to keep in the in-memory 
cache</li>
+ *     <li>cacheSizeMB: the maximum size of the in-memory cache in MB</li>
+ * </ul>
+ * <p>
+ * The recommended configuration is to rely on the total memory usage to limit 
the cache, giving as much memory as
+ * available in the JVM, and setting a very high limit for the number of 
elements. A cache miss has a very high cost,
+ * so it should be avoided as much as possible.
+ * <p>
+ * <p>
+ * Each element is stored either in the cache or in the persistent store, but 
not in both. And elements are not moved
+ * between the two tiers, so even if there is a cache miss, that element will 
remain in the persistent store.
+ * For the access pattern of the indexer, this policy has a lower rate of 
cache misses than if we move to the cache an
+ * element after a miss.
+ * <p>
+ * To understand why, let's assume we want to traverse the children of a node 
P that is at line/position 100 in the FFS.
+ * When we call getChildren on P, this creates an iterator that scans from 
position 100 for all the children.
+ * If we call recursively getChildren on a child C of P, this will also create 
a new iterator that will start also at
+ * position 100. Therefore, the iterators will frequently scan from 100 down. 
Let's assume the cache can only hold 10
+ * nodes and that we use a policy of moving to the cache the last node that 
was accessed. Then, if an iterator scans
+ * from 100 to, let's say 150, when it finishes iterating, the nodes 141 to 
150 will be the only ones in the cache.
+ * The next iterator that is scanning from 100 will have cache misses for all 
the nodes until 140. And this will repeat
+ * for every new iterator. On the other hand, if we keep in the cache the 
nodes from 100 to 109, every iterator starting
+ * from 100 will at least have 10 cache hits, which is better than having a 
cache miss for all elements.
+ */
+public class PersistedLinkedListV2 implements NodeStateEntryList {
+
+    private final static Logger LOG = 
LoggerFactory.getLogger(PersistedLinkedListV2.class);
+
+    private static final String COMPACT_STORE_MILLIS_NAME = 
"oak.indexer.linkedList.compactMillis";
+
+    private final HashMap<Long, NodeStateEntry> cache = new HashMap<>(512);
+    private final int compactStoreMillis = 
Integer.getInteger(COMPACT_STORE_MILLIS_NAME, 60 * 1000);
+    private final NodeStateEntryWriter writer;
+    private final NodeStateEntryReader reader;
+    private final String storeFileName;
+    private final long cacheSizeLimitBytes;
+    private final long cacheSizeLimit;
+
+    private MVStore store;
+    private MVMap<Long, String> map;
+    private long headIndex;
+    private long tailIndex;
+    // Total entries in the list
+    private long totalEntries;
+    private long lastLog;
+    private long lastCompact;
+
+    // Estimation of the cache size
+    private long cacheSizeEstimationBytes;
+
+    // Metrics
+    private long cacheHits;
+    private long cacheMisses;
+    private long storeWrites; // Each cache miss is a read from the store, so 
no need for a storeRead counter
+    private long peakCacheSizeBytes;
+    private long peakCacheSize;
+
+    /**
+     * @param cacheSize   the maximum number of elements to keep in the 
in-memory cache
+     * @param cacheSizeMB the maximum size of the in-memory cache in MB
+     */
+    public PersistedLinkedListV2(String fileName, NodeStateEntryWriter writer, 
NodeStateEntryReader reader, int cacheSize, int cacheSizeMB) {
+        this.cacheSizeLimit = cacheSize;
+        this.cacheSizeLimitBytes = ((long) cacheSizeMB) * 1024 * 1024;
+        this.storeFileName = fileName;
+        LOG.info("Opening store {}", fileName);
+        File oldFile = new File(fileName);
+        if (oldFile.exists()) {
+            LOG.info("Deleting {}", fileName);
+            try {
+                FileUtils.forceDelete(oldFile);
+            } catch (IOException e) {
+                throw new IllegalStateException(e);
+            }
+        }
+        openStore();
+        this.writer = writer;
+        this.reader = reader;
+        lastCompact = System.currentTimeMillis();
+    }
+
+    private void openStore() {
+        store = MVStore.open(storeFileName);
+        map = store.openMap("list");
+    }
+
+    @Override
+    public void add(@NotNull NodeStateEntry item) {
+        Preconditions.checkArgument(item != null, "Can't add null to the 
list");
+        Long index = tailIndex++;
+        addEntryToCache(index, item);
+    }
+
+    @Override
+    public boolean isEmpty() {
+        return totalEntries == 0;
+    }
+
+    @Override
+    public Iterator<NodeStateEntry> iterator() {
+        return new NodeIterator(headIndex);
+    }
+
+    @Override
+    public NodeStateEntry remove() {
+        Preconditions.checkState(!isEmpty(), "Cannot remove item from empty 
list");
+        Long boxedHeadIndex = headIndex;
+        NodeStateEntry entryRemoved = cache.remove(boxedHeadIndex);
+        if (entryRemoved == null) {
+            String mapEntry = map.remove(boxedHeadIndex);
+            if (mapEntry == null) {
+                throw new IllegalStateException("Entry not found in cache or 
in store: " + boxedHeadIndex);
+            }
+            cacheMisses++;
+            entryRemoved = reader.read(mapEntry);
+        } else {
+            cacheHits++;
+            cacheSizeEstimationBytes -= entryRemoved.estimatedMemUsage();
+        }
+
+        headIndex++;
+        totalEntries--;
+        if (totalEntries == 0) {
+            map.clear();

Review Comment:
   We could log a warning here if cacheSizeEstimationBytes != 0



##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/linkedList/PersistedLinkedListV2.java:
##########
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.jackrabbit.guava.common.base.Preconditions;
+import org.apache.jackrabbit.oak.commons.IOUtils;
+import org.apache.jackrabbit.oak.index.indexer.document.NodeStateEntry;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.NodeStateEntryReader;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.NodeStateEntryWriter;
+import org.h2.mvstore.MVMap;
+import org.h2.mvstore.MVStore;
+import org.h2.mvstore.MVStoreTool;
+import org.jetbrains.annotations.NotNull;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Iterator;
+
+/**
+ * A persistent linked list that internally uses the MVStore. This list keeps 
an in-memory cache, writing to the
+ * persistent store the nodes only when the cache is full. The in-memory cache 
is limited by two parameters:
+ *
+ * <ul>
+ *     <li>cacheSize: the maximum number of elements to keep in the in-memory 
cache</li>
+ *     <li>cacheSizeMB: the maximum size of the in-memory cache in MB</li>
+ * </ul>
+ * <p>
+ * The recommended configuration is to rely on the total memory usage to limit 
the cache, giving as much memory as
+ * available in the JVM, and setting a very high limit for the number of 
elements. A cache miss has a very high cost,
+ * so it should be avoided as much as possible.
+ * <p>
+ * <p>
+ * Each element is stored either in the cache or in the persistent store, but 
not in both. And elements are not moved
+ * between the two tiers, so even if there is a cache miss, that element will 
remain in the persistent store.
+ * For the access pattern of the indexer, this policy has a lower rate of 
cache misses than if we move to the cache an
+ * element after a miss.
+ * <p>
+ * To understand why, let's assume we want to traverse the children of a node 
P that is at line/position 100 in the FFS.
+ * When we call getChildren on P, this creates an iterator that scans from 
position 100 for all the children.
+ * If we call recursively getChildren on a child C of P, this will also create 
a new iterator that will start also at
+ * position 100. Therefore, the iterators will frequently scan from 100 down. 
Let's assume the cache can only hold 10
+ * nodes and that we use a policy of moving to the cache the last node that 
was accessed. Then, if an iterator scans
+ * from 100 to, let's say 150, when it finishes iterating, the nodes 141 to 
150 will be the only ones in the cache.
+ * The next iterator that is scanning from 100 will have cache misses for all 
the nodes until 140. And this will repeat
+ * for every new iterator. On the other hand, if we keep in the cache the 
nodes from 100 to 109, every iterator starting
+ * from 100 will at least have 10 cache hits, which is better than having a 
cache miss for all elements.
+ */
+public class PersistedLinkedListV2 implements NodeStateEntryList {
+
+    private final static Logger LOG = 
LoggerFactory.getLogger(PersistedLinkedListV2.class);
+
+    private static final String COMPACT_STORE_MILLIS_NAME = 
"oak.indexer.linkedList.compactMillis";
+
+    private final HashMap<Long, NodeStateEntry> cache = new HashMap<>(512);
+    private final int compactStoreMillis = 
Integer.getInteger(COMPACT_STORE_MILLIS_NAME, 60 * 1000);
+    private final NodeStateEntryWriter writer;
+    private final NodeStateEntryReader reader;
+    private final String storeFileName;
+    private final long cacheSizeLimitBytes;
+    private final long cacheSizeLimit;
+
+    private MVStore store;
+    private MVMap<Long, String> map;
+    private long headIndex;
+    private long tailIndex;
+    // Total entries in the list
+    private long totalEntries;
+    private long lastLog;
+    private long lastCompact;
+
+    // Estimation of the cache size
+    private long cacheSizeEstimationBytes;
+
+    // Metrics
+    private long cacheHits;
+    private long cacheMisses;
+    private long storeWrites; // Each cache miss is a read from the store, so 
no need for a storeRead counter
+    private long peakCacheSizeBytes;
+    private long peakCacheSize;
+
+    /**
+     * @param cacheSize   the maximum number of elements to keep in the 
in-memory cache
+     * @param cacheSizeMB the maximum size of the in-memory cache in MB
+     */
+    public PersistedLinkedListV2(String fileName, NodeStateEntryWriter writer, 
NodeStateEntryReader reader, int cacheSize, int cacheSizeMB) {
+        this.cacheSizeLimit = cacheSize;
+        this.cacheSizeLimitBytes = ((long) cacheSizeMB) * 1024 * 1024;
+        this.storeFileName = fileName;
+        LOG.info("Opening store {}", fileName);
+        File oldFile = new File(fileName);
+        if (oldFile.exists()) {
+            LOG.info("Deleting {}", fileName);
+            try {
+                FileUtils.forceDelete(oldFile);
+            } catch (IOException e) {
+                throw new IllegalStateException(e);
+            }
+        }
+        openStore();
+        this.writer = writer;
+        this.reader = reader;
+        lastCompact = System.currentTimeMillis();
+    }
+
+    private void openStore() {
+        store = MVStore.open(storeFileName);
+        map = store.openMap("list");
+    }
+
+    @Override
+    public void add(@NotNull NodeStateEntry item) {
+        Preconditions.checkArgument(item != null, "Can't add null to the 
list");
+        Long index = tailIndex++;
+        addEntryToCache(index, item);
+    }
+
+    @Override
+    public boolean isEmpty() {
+        return totalEntries == 0;
+    }
+
+    @Override
+    public Iterator<NodeStateEntry> iterator() {
+        return new NodeIterator(headIndex);
+    }
+
+    @Override
+    public NodeStateEntry remove() {
+        Preconditions.checkState(!isEmpty(), "Cannot remove item from empty 
list");
+        Long boxedHeadIndex = headIndex;
+        NodeStateEntry entryRemoved = cache.remove(boxedHeadIndex);
+        if (entryRemoved == null) {
+            String mapEntry = map.remove(boxedHeadIndex);
+            if (mapEntry == null) {
+                throw new IllegalStateException("Entry not found in cache or 
in store: " + boxedHeadIndex);
+            }
+            cacheMisses++;
+            entryRemoved = reader.read(mapEntry);
+        } else {
+            cacheHits++;
+            cacheSizeEstimationBytes -= entryRemoved.estimatedMemUsage();
+        }
+
+        headIndex++;
+        totalEntries--;
+        if (totalEntries == 0) {
+            map.clear();
+            cache.clear();
+        }
+        return entryRemoved;
+    }
+
+    private NodeStateEntry get(Long index) {
+        NodeStateEntry result = cache.get(index);
+        if (result == null) {
+            cacheMisses++;
+            String s = map.get(index);
+            result = reader.read(s);
+            LOG.trace("Cache miss: {}={}", index, result.getPath());
+        } else {
+            cacheHits++;
+        }
+        return result;
+    }
+
+    private void addEntryToCache(Long index, NodeStateEntry entry) {
+        long now = System.currentTimeMillis();
+        long newCacheSizeBytes = cacheSizeEstimationBytes + 
entry.estimatedMemUsage();
+        if (cache.size() == cacheSizeLimit || newCacheSizeBytes > 
cacheSizeLimitBytes) {

Review Comment:
   I would use the more conservative "cache.size() >= cacheSizeLimit".



##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/linkedList/PersistedLinkedListV2.java:
##########
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.jackrabbit.guava.common.base.Preconditions;
+import org.apache.jackrabbit.oak.commons.IOUtils;
+import org.apache.jackrabbit.oak.index.indexer.document.NodeStateEntry;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.NodeStateEntryReader;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.NodeStateEntryWriter;
+import org.h2.mvstore.MVMap;
+import org.h2.mvstore.MVStore;
+import org.h2.mvstore.MVStoreTool;
+import org.jetbrains.annotations.NotNull;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Iterator;
+
+/**
+ * A persistent linked list that internally uses the MVStore. This list keeps 
an in-memory cache, writing to the
+ * persistent store the nodes only when the cache is full. The in-memory cache 
is limited by two parameters:
+ *
+ * <ul>
+ *     <li>cacheSize: the maximum number of elements to keep in the in-memory 
cache</li>
+ *     <li>cacheSizeMB: the maximum size of the in-memory cache in MB</li>
+ * </ul>
+ * <p>
+ * The recommended configuration is to rely on the total memory usage to limit 
the cache, giving as much memory as
+ * available in the JVM, and setting a very high limit for the number of 
elements. A cache miss has a very high cost,
+ * so it should be avoided as much as possible.
+ * <p>
+ * <p>
+ * Each element is stored either in the cache or in the persistent store, but 
not in both. And elements are not moved
+ * between the two tiers, so even if there is a cache miss, that element will 
remain in the persistent store.
+ * For the access pattern of the indexer, this policy has a lower rate of 
cache misses than if we move to the cache an
+ * element after a miss.
+ * <p>
+ * To understand why, let's assume we want to traverse the children of a node 
P that is at line/position 100 in the FFS.
+ * When we call getChildren on P, this creates an iterator that scans from 
position 100 for all the children.
+ * If we call recursively getChildren on a child C of P, this will also create 
a new iterator that will start also at
+ * position 100. Therefore, the iterators will frequently scan from 100 down. 
Let's assume the cache can only hold 10
+ * nodes and that we use a policy of moving to the cache the last node that 
was accessed. Then, if an iterator scans
+ * from 100 to, let's say 150, when it finishes iterating, the nodes 141 to 
150 will be the only ones in the cache.
+ * The next iterator that is scanning from 100 will have cache misses for all 
the nodes until 140. And this will repeat
+ * for every new iterator. On the other hand, if we keep in the cache the 
nodes from 100 to 109, every iterator starting
+ * from 100 will at least have 10 cache hits, which is better than having a 
cache miss for all elements.
+ */
+public class PersistedLinkedListV2 implements NodeStateEntryList {
+
+    private final static Logger LOG = 
LoggerFactory.getLogger(PersistedLinkedListV2.class);
+
+    private static final String COMPACT_STORE_MILLIS_NAME = 
"oak.indexer.linkedList.compactMillis";
+
+    private final HashMap<Long, NodeStateEntry> cache = new HashMap<>(512);
+    private final int compactStoreMillis = 
Integer.getInteger(COMPACT_STORE_MILLIS_NAME, 60 * 1000);
+    private final NodeStateEntryWriter writer;
+    private final NodeStateEntryReader reader;
+    private final String storeFileName;
+    private final long cacheSizeLimitBytes;
+    private final long cacheSizeLimit;
+
+    private MVStore store;
+    private MVMap<Long, String> map;
+    private long headIndex;
+    private long tailIndex;
+    // Total entries in the list
+    private long totalEntries;
+    private long lastLog;
+    private long lastCompact;
+
+    // Estimation of the cache size
+    private long cacheSizeEstimationBytes;
+
+    // Metrics
+    private long cacheHits;
+    private long cacheMisses;
+    private long storeWrites; // Each cache miss is a read from the store, so 
no need for a storeRead counter
+    private long peakCacheSizeBytes;
+    private long peakCacheSize;
+
+    /**
+     * @param cacheSize   the maximum number of elements to keep in the 
in-memory cache
+     * @param cacheSizeMB the maximum size of the in-memory cache in MB
+     */
+    public PersistedLinkedListV2(String fileName, NodeStateEntryWriter writer, 
NodeStateEntryReader reader, int cacheSize, int cacheSizeMB) {
+        this.cacheSizeLimit = cacheSize;
+        this.cacheSizeLimitBytes = ((long) cacheSizeMB) * 1024 * 1024;
+        this.storeFileName = fileName;
+        LOG.info("Opening store {}", fileName);
+        File oldFile = new File(fileName);
+        if (oldFile.exists()) {
+            LOG.info("Deleting {}", fileName);
+            try {
+                FileUtils.forceDelete(oldFile);
+            } catch (IOException e) {
+                throw new IllegalStateException(e);
+            }
+        }
+        openStore();
+        this.writer = writer;
+        this.reader = reader;
+        lastCompact = System.currentTimeMillis();
+    }
+
+    private void openStore() {
+        store = MVStore.open(storeFileName);
+        map = store.openMap("list");
+    }
+
+    @Override
+    public void add(@NotNull NodeStateEntry item) {
+        Preconditions.checkArgument(item != null, "Can't add null to the 
list");
+        Long index = tailIndex++;
+        addEntryToCache(index, item);
+    }
+
+    @Override
+    public boolean isEmpty() {
+        return totalEntries == 0;
+    }
+
+    @Override
+    public Iterator<NodeStateEntry> iterator() {
+        return new NodeIterator(headIndex);
+    }
+
+    @Override
+    public NodeStateEntry remove() {
+        Preconditions.checkState(!isEmpty(), "Cannot remove item from empty 
list");
+        Long boxedHeadIndex = headIndex;
+        NodeStateEntry entryRemoved = cache.remove(boxedHeadIndex);
+        if (entryRemoved == null) {
+            String mapEntry = map.remove(boxedHeadIndex);
+            if (mapEntry == null) {
+                throw new IllegalStateException("Entry not found in cache or 
in store: " + boxedHeadIndex);
+            }
+            cacheMisses++;
+            entryRemoved = reader.read(mapEntry);
+        } else {
+            cacheHits++;
+            cacheSizeEstimationBytes -= entryRemoved.estimatedMemUsage();
+        }
+
+        headIndex++;
+        totalEntries--;
+        if (totalEntries == 0) {
+            map.clear();
+            cache.clear();
+        }
+        return entryRemoved;
+    }
+
+    private NodeStateEntry get(Long index) {
+        NodeStateEntry result = cache.get(index);
+        if (result == null) {
+            cacheMisses++;
+            String s = map.get(index);
+            result = reader.read(s);
+            LOG.trace("Cache miss: {}={}", index, result.getPath());
+        } else {
+            cacheHits++;
+        }
+        return result;
+    }
+
+    private void addEntryToCache(Long index, NodeStateEntry entry) {

Review Comment:
   I would use "long" instead of "Long" here, because index may not be null. 
(I'm aware it can be null in the internal hash map.)



##########
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/FlatFileStoreIterator.java:
##########
@@ -19,34 +19,44 @@
 
 package org.apache.jackrabbit.oak.index.indexer.document.flatfile;
 
-import static org.apache.jackrabbit.guava.common.collect.Iterators.concat;
-import static 
org.apache.jackrabbit.guava.common.collect.Iterators.singletonIterator;
-
-import java.io.Closeable;
-import java.util.Iterator;
-import java.util.Set;
-
+import org.apache.jackrabbit.guava.common.collect.AbstractIterator;
+import org.apache.jackrabbit.oak.commons.IOUtils;
 import org.apache.jackrabbit.oak.index.indexer.document.NodeStateEntry;
 import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList.FlatFileBufferLinkedList;
 import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList.NodeStateEntryList;
 import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList.PersistedLinkedList;
+import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.linkedList.PersistedLinkedListV2;
 import 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.pipelined.ConfigHelper;
 import org.apache.jackrabbit.oak.spi.blob.BlobStore;
 import org.apache.jackrabbit.oak.spi.state.NodeState;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.jackrabbit.guava.common.collect.AbstractIterator;
+import java.io.Closeable;
+import java.util.Iterator;
+import java.util.Set;
+
+import static org.apache.jackrabbit.guava.common.collect.Iterators.concat;
+import static 
org.apache.jackrabbit.guava.common.collect.Iterators.singletonIterator;
 
 class FlatFileStoreIterator extends AbstractIterator<NodeStateEntry> 
implements Iterator<NodeStateEntry>, Closeable {
-    private static final Logger log = 
LoggerFactory.getLogger(FlatFileStoreIterator.class);
+    private static final Logger LOG = 
LoggerFactory.getLogger(FlatFileStoreIterator.class);
 
     static final String BUFFER_MEM_LIMIT_CONFIG_NAME = 
"oak.indexer.memLimitInMB";
     // by default, use the PersistedLinkedList
     private static final int DEFAULT_BUFFER_MEM_LIMIT_IN_MB = 0;
-    static final String PERSISTED_LINKED_LIST_CACHE_SIZE = 
"oak.indexer.persistedLinkedList.cacheSize";
-    static final int DEFAULT_PERSISTED_LINKED_LIST_CACHE_SIZE = 1000;
 
+    public static final String PERSISTED_LINKED_LIST_CACHE_SIZE = 
"oak.indexer.persistedLinkedList.cacheSize";
+    public static final int DEFAULT_PERSISTED_LINKED_LIST_CACHE_SIZE = 1000;
+
+    public static final String PERSISTED_LINKED_LIST_V2_CACHE_SIZE = 
"oak.indexer.persistedLinkedListV2.cacheSize";
+    public static final int DEFAULT_PERSISTED_LINKED_LIST_V2_CACHE_SIZE = 
10000;
+
+    public static final String PERSISTED_LINKED_LIST_V2_MEMORY_CACHE_SIZE_MB = 
"oak.indexer.persistedLinkedListV2.cacheMaxSizeMB";
+    public static final int 
DEFAULT_PERSISTED_LINKED_LIST_V2_MEMORY_CACHE_SIZE_MB = 8;
+
+    public static final String PERSISTED_LINKED_LIST_USE_V2 = 
"oak.indexer.persistedLinkedList.useV2";

Review Comment:
   (As always when using caches with a "size" limit and not a "memory" limit) 
there was a risk of running out-of-memory. This risk existed even before, with 
1000. 
   
   Now we have 10'000, but there is another "memory" limit at 8 MB. 8 MB is s 
bit small in my view, I would use 32 MB at least. What about using just the 
memory limit, and not a size limit? That would simplify the code, and simpler 
code typically has less bugs. I understand that on the other hand, if there is 
a bug with the memory calculation, then there is a risk as well. But I think 
the memory calculation must not have bugs (of the form, memory is smaller or 
equal 0): if it has, then we are in trouble anyway.
   
   So I would opt for the simpler code!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] OAK-10966 - Indexing job: create optimized version of PersistedLinkedList [jackrabbit-oak]

Reply via email to