[PR] OAK-11277 Tree store: fix memory usage and support concurrent indexing [jackrabbit-oak]

via GitHub Thu, 21 Nov 2024 06:45:26 -0800


thomasmueller opened a new pull request, #1873:
URL: https://github.com/apache/jackrabbit-oak/pull/1873


   The PR fixes a bunch of issues:
   
   * IndexMeta: when re-indexing multiple indexes, each writer that tries to 
open a writer first reads _all_ metadata files (including those of concurrently 
added files). The thread is only interested in the current index, and all other 
indexes are then ignored - and on this index we synchronized. However, the 
problem is: another thread might concurrently create _another_ index, whose 
metadata file could be empty when this thread reads it. So protection against 
that is needed.
   * The OakDirectory uses a ConcurrentHashSet, however it doesn't properly 
synchronize on the node builder.
   * The MultiplexingIndexWriter doesn't support concurrent access (unlike the 
default index writer). This was not detected so far because multi-threaded 
indexing was usually only used for one single index. I tries reindexing all 
indexes with many threads, which uncovered this issue. (First, it is using a 
regular HashMap instead of a concurrent one... but more importantly, it could 
concurrently create two writers).
   * The PipelinedTreeStoreStrategy doesn't support filtering yet, unlike the 
regular pipelined strategy.
   * TreeStore memory usage: the cache size calculation was wrong: it 
multiplied by the size factor twice. This could result in out-of-memory.
   * The FulltextBinaryTextExtractor didn't properly support concurrent 
initialization, leading to a NullPointerException.
   
   For the concurrency issues, I added tests. They are pretty fast, because the 
corruption happens in memory and not on disk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] OAK-11277 Tree store: fix memory usage and support concurrent indexing [jackrabbit-oak]

Reply via email to