Re: A model for predicting indexing memory costs?

Michael McCandless Mon, 09 Mar 2009 17:02:06 -0700


mark harwood wrote:

I've been building a large index (hundreds of millions) with mainlystructured data which consists of several fields with mostly uniquevalues.I've been hitting out of memory issues when doing periodic commits/closes which I suspect is down to the sheer number of terms.
I set the IndexWriter..setTermIndexInterval to 8 times the normalsize of 128 (an intervalof 1024) which delayed the onset of theissue but still failed.


I think that setting won't change how much RAM is used when writing.

I'd like to get a little more scientific about what to set hererather than simply experimenting with settings and hoping it doesn'tfail again.
Does anyone have a decent model worked out for how much memory isconsumed at peak? I'm guessing the contributing factors are:
* Numbers of fields
* Numbers of unique terms per field
* Numbers of segments?

Number of net unique terms (across all fields) is a big driver, butalso net number of term occurrences, and how many docs. Lots of tinydocs take more RAM than fewer large docs, when # occurrences are equal.

But... how come setting IW's RAM buffer doesn't prevent the OOMs? IWshould simply flush when it's used that much RAM.


I don't think number of segments is a factor.

Though mergeFactor is, since during merging the SegmentMerger holdsSegmentReaders open, and int[] maps (if there are any deletes) foreach segment. Do you have a large merge taking place when you hit theOOMs?


Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: A model for predicting indexing memory costs?

Reply via email to