thomasmueller opened a new pull request, #1873: URL: https://github.com/apache/jackrabbit-oak/pull/1873
The PR fixes a bunch of issues: * IndexMeta: when re-indexing multiple indexes, each writer that tries to open a writer first reads _all_ metadata files (including those of concurrently added files). The thread is only interested in the current index, and all other indexes are then ignored - and on this index we synchronized. However, the problem is: another thread might concurrently create _another_ index, whose metadata file could be empty when this thread reads it. So protection against that is needed. * The OakDirectory uses a ConcurrentHashSet, however it doesn't properly synchronize on the node builder. * The MultiplexingIndexWriter doesn't support concurrent access (unlike the default index writer). This was not detected so far because multi-threaded indexing was usually only used for one single index. I tries reindexing all indexes with many threads, which uncovered this issue. (First, it is using a regular HashMap instead of a concurrent one... but more importantly, it could concurrently create two writers). * The PipelinedTreeStoreStrategy doesn't support filtering yet, unlike the regular pipelined strategy. * TreeStore memory usage: the cache size calculation was wrong: it multiplied by the size factor twice. This could result in out-of-memory. * The FulltextBinaryTextExtractor didn't properly support concurrent initialization, leading to a NullPointerException. For the concurrency issues, I added tests. They are pretty fast, because the corruption happens in memory and not on disk. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org