Re: Control the number of segments without using forceMerge.

2021-07-05 Thread Alex K
After some more reading, the NoMergePolicy seems to mostly solve my problem. I've configured my IndexWriterConfig with: .setMaxBufferedDocs(Integer.MAX_VALUE) .setRAMBufferSizeMB(Double.MAX_VALUE) .setMergePolicy(NoMergePolicy.INSTANCE) With this config I consistently end up with a n

Re: Control the number of segments without using forceMerge.

2021-07-05 Thread Alex K
Ok, so it sounds like if you want a very specific number of segments you have to do a forceMerge at some point? Is there some simple summary on how segments are formed in the first place? Something like, "one segment is created every time you flush from an IndexWriter"? Based on some experimenting

Re: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Alex K
Hi Uwe, Thanks for clarifying. That makes sense. Thanks, Alex Klibisz On Mon, Jul 5, 2021 at 9:22 AM Uwe Schindler wrote: > Hi, > > Sorry I misunderstood you question, you want to lookup the UUID in another > system! > Then the approach you are doing is correct. Either store as stored field > or

RE: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Uwe Schindler
Hi, Sorry I misunderstood you question, you want to lookup the UUID in another system! Then the approach you are doing is correct. Either store as stored field or as docvalue. An inverted index cannot store additional data, because it *is* inverted, it is focused around *terms* not documents. T

RE: Control the number of segments without using forceMerge.

2021-07-05 Thread Uwe Schindler
If you want an exact number of segments, create 64 indexes, each forceMerged to one segment. After that use MultiReader to create a view on all separate indexes. MultiReaders's contents are always flattened to a list of those 64 indexes. But keep in mind that this should only ever be done with *

RE: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Uwe Schindler
You need to index the UUID as a standard indexed StringField. Then you can do a lookup using TermQuery. That's how all systems like Solr or Elasticsearch handle document identifiers. DocValues are for facetting and sorting, but looking up by ID is a typical use case for an inverted index. If yo