On Nov 13, 2005, at 10:22 PM, Daniel Noll wrote:

Okay, I've gone and revised how things are fitting together in our app. It seems that we already call optimize() at the end of all the processing, before which I could figure out what kind of value we should be using and call this setter method which I'll patch into the version we're running.

That may be a little tricky... indexInterval is set at the IndexWriter level, but it has to propagate downwards. Where it actually makes a difference is in TermInfosWriter. (TermInfosWriter creates a doppelganger and adds a term to the doppelganger every loop iter modulo indexInterval.) IIRC, it has to get there via a chain of two constructors. Those constructors might be the same in in 1.4.3, but probably not, if indexInterval wasn't settable then. I think this number used to be a constant at one time. This stuff is all implementation details in private classes, so we're talking unsupported hackery... if updating to the current trunk isn't feasible, it may not be worth it.

My logic will probably just say that each index is allowed to store X terms, so if the number of terms is greater than some value, I'll double the indexInterval until it comes to some amount which _should_ fit under that size.

Sure. You're just increasing the number of terms the search app has to scan through in the .tis file after it gets in the ballpark by consulting the cached .tii information.

If I can also remove smaller junk words, we'll save even more space due to having less terms in total

Hmm... have you not experimented with stoplists, in StopFilter, StopAnalyzer, or StandardAnalyzer? If you haven't, you almost certainly want to do that before asking for trouble by kludging setIndexInterval into 1.4.3. The internals of TermInfosWriter are quite complex.

Best,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to