On Nov 13, 2005, at 6:27 PM, Chris Hostetter wrote:

I believe if you really want to determine settings like this after
building the index, you'll need to do an initial build the index using
best guess values -- then if the calculations you do once the index is
built aren't close enough to your guesses to satisfy you, change the value
and optimize.

Good tip.  :)

From what i remember about how optimize works, it creates all new segments
regardless of the previous state of the index -- and those new segments
should use the newly set values.

Yes, it will use the new values. The one caveat is that if the index is already optimized, calling optimize() won't do anything. Adding or deleting a single document is enough to trigger a rewrite, though.

Daniel, under the hood, there are two term dictionary files, with nearly identical structures: the main .tis file, and the index .tii file. (Mnemonic: .tis is TermInfoS, and .tii is TermInfosIndex.) If indexInterval is set to the default of 128, then the .tii file contains every 128th entry from the main file, plus a pointer to where that entry is located in the main file.

When you load up an IndexReader, the entire .tii file gets decompressed and loaded into RAM. The number of entries in the .tii file corresponds directly to the RAM footprint. If you want less RAM usage, that file has to get smaller.

Hoss's solution is the fastest way to find the best values for indexInterval -- you're rewriting the entire index, but it's faster than reindexing from scratch because you don't need to redo the IO or the analysis. Few people will find it useful to tinker with this, but you're the exception, and I'll be interested to hear about your findings.

Best,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to