After some more reading, the NoMergePolicy seems to mostly solve my problem.
I've configured my IndexWriterConfig with:
.setMaxBufferedDocs(Integer.MAX_VALUE)
.setRAMBufferSizeMB(Double.MAX_VALUE)
.setMergePolicy(NoMergePolicy.INSTANCE)
With this config I consistently end up with a n
Ok, so it sounds like if you want a very specific number of segments you
have to do a forceMerge at some point?
Is there some simple summary on how segments are formed in the first place?
Something like, "one segment is created every time you flush from an
IndexWriter"? Based on some experimenting
Hi Uwe,
Thanks for clarifying. That makes sense.
Thanks,
Alex Klibisz
On Mon, Jul 5, 2021 at 9:22 AM Uwe Schindler wrote:
> Hi,
>
> Sorry I misunderstood you question, you want to lookup the UUID in another
> system!
> Then the approach you are doing is correct. Either store as stored field
> or
Hi,
Sorry I misunderstood you question, you want to lookup the UUID in another
system!
Then the approach you are doing is correct. Either store as stored field or as
docvalue. An inverted index cannot store additional data, because it *is*
inverted, it is focused around *terms* not documents. T
If you want an exact number of segments, create 64 indexes, each forceMerged to
one segment.
After that use MultiReader to create a view on all separate indexes.
MultiReaders's contents are always flattened to a list of those 64 indexes.
But keep in mind that this should only ever be done with *
You need to index the UUID as a standard indexed StringField. Then you can do a
lookup using TermQuery. That's how all systems like Solr or Elasticsearch
handle document identifiers.
DocValues are for facetting and sorting, but looking up by ID is a typical use
case for an inverted index. If yo