mark harwood wrote:
I've been building a large index (hundreds of millions) with mainly
structured data which consists of several fields with mostly unique
values.
I've been hitting out of memory issues when doing periodic commits/
closes which I suspect is down to the sheer number of terms.
I set the IndexWriter..setTermIndexInterval to 8 times the normal
size of 128 (an intervalof 1024) which delayed the onset of the
issue but still failed.
I think that setting won't change how much RAM is used when writing.
I'd like to get a little more scientific about what to set here
rather than simply experimenting with settings and hoping it doesn't
fail again.
Does anyone have a decent model worked out for how much memory is
consumed at peak? I'm guessing the contributing factors are:
* Numbers of fields
* Numbers of unique terms per field
* Numbers of segments?
Number of net unique terms (across all fields) is a big driver, but
also net number of term occurrences, and how many docs. Lots of tiny
docs take more RAM than fewer large docs, when # occurrences are equal.
But... how come setting IW's RAM buffer doesn't prevent the OOMs? IW
should simply flush when it's used that much RAM.
I don't think number of segments is a factor.
Though mergeFactor is, since during merging the SegmentMerger holds
SegmentReaders open, and int[] maps (if there are any deletes) for
each segment. Do you have a large merge taking place when you hit the
OOMs?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org