On Nov 13, 2005, at 6:27 PM, Chris Hostetter wrote:
I believe if you really want to determine settings like this after
building the index, you'll need to do an initial build the index using
best guess values -- then if the calculations you do once the index is
built aren't close enough to your guesses to satisfy you, change
the value
and optimize.
Good tip. :)
From what i remember about how optimize works, it creates all new
segments
regardless of the previous state of the index -- and those new
segments
should use the newly set values.
Yes, it will use the new values. The one caveat is that if the index
is already optimized, calling optimize() won't do anything. Adding
or deleting a single document is enough to trigger a rewrite, though.
Daniel, under the hood, there are two term dictionary files, with
nearly identical structures: the main .tis file, and the index .tii
file. (Mnemonic: .tis is TermInfoS, and .tii is TermInfosIndex.) If
indexInterval is set to the default of 128, then the .tii file
contains every 128th entry from the main file, plus a pointer to
where that entry is located in the main file.
When you load up an IndexReader, the entire .tii file gets
decompressed and loaded into RAM. The number of entries in the .tii
file corresponds directly to the RAM footprint. If you want less RAM
usage, that file has to get smaller.
Hoss's solution is the fastest way to find the best values for
indexInterval -- you're rewriting the entire index, but it's faster
than reindexing from scratch because you don't need to redo the IO or
the analysis. Few people will find it useful to tinker with this,
but you're the exception, and I'll be interested to hear about your
findings.
Best,
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]