Hmm, I agree we could be more RAM efficient if the field is DOCS_ONLY. We shouldn't have to allocate/use docFreqs, lastDocCodes, lastPositions arrays (3 of the 7); the others are still needed, I think.
But, that said, you shouldn't hit OOME, as long as your max heap sizes is large enough (and, your IndexWriterConfig's RAMBufferSizeMB is small enough); Lucene should simply flush a new segment once the buffered documents are using too much RAM. Hmm, and you don't index massive documents. How many UUIDs per document? Mike McCandless http://blog.mikemccandless.com On Mon, Mar 19, 2012 at 3:29 PM, Ken McCracken <ken.mccrac...@gmail.com> wrote: > Hi, > > I am using lucene-3.5 and getting an OutOfMemoryError on a large indexing > task of 100M documents. I am creating an index with 3 UUIDs as separate > field values. I am using Store.YES on 1 of them and Store.NO on the > others; I am using Index.NOT_ANALYZED_NO_NORMS on all three; explicitly > setting > field.setIndexOptions(IndexOptions.DOCS_ONLY); and > indexWriterConfig.setTermIndexInterval(termIndexInterval); to 1024. I am > trying to index 100M records into my index. > > Is there any reason FreqProxTermsWriterPerField.FreqProxPostingsArray needs > to be constructed even though I have the positions etc suppressed? It > seems that the reason I get an OutOfMemoryError is that 7 int[] of size > proportional to number of unique fields are being constructed; however, at > least some of them are probably wasteful given my indexing configurations. > > Any help is appreciated. > > Thanks, > -Ken > > [junit] Error: > [junit] Exception in thread "Thread-18" java.lang.OutOfMemoryError: > Java heap space > [junit] at > org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:35) > [junit] at > org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:190) > [junit] at > org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204) > [junit] at > org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48) > [junit] at > org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:137) > [junit] at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:440) > [junit] at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:94) > [junit] at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278) --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org