Re: Document larger than setRAMBufferSizeMB()

Michael McCandless Fri, 03 Oct 2008 03:07:29 -0700

First off, IndexWriter's RAM buffer size is "approximate": after eachdoc is added, we check if the RAM consumed is greater than our bugdet,and if so, we flush.

When you add a doc that's larger than the RAM buffer size, all thatwill happen is after that doc is indexed, we flush. In other words,that doc will cause IndexWriter to use more RAM that it's budget,until it flushes.

IndexWriter never throws OOME itself. The fact that you're hitting itmeans your JRE is starved for RAM. You should try increasing it'sallowed max heap size (eg -Xmx2048M), making sure you have enoughphysical RAM on the machine to not start thrashing.

Second, whenever a document hits an exception during indexing, one oftwo things will happen in response. If the exception is at a badtime, meaning it may have corrupted the internal RAM buffer,IndexWriter will abort all buffered documents (since the last flush).OOME usually falls into this category. If instead the exceptionhappens at an "OK" time, say when asking for the next Token from theTokenStream, then we stop indexing that document and immediately markit as deleted so the "first half" that had been indexed will never bevisible in the index -- ie, it's "all or none".


Mike

Aditi Goyal wrote:

Thanks Anshum.
Although it raises another query, committing the current buffer willcommitthe docs before and what will happen to the current doc which threwan error
while adding a field to it, will that also get committed in the half??

Thanks a lot
Aditi

On Fri, Oct 3, 2008 at 2:12 PM, Anshum <[EMAIL PROTECTED]> wrote:
Hi Aditi,
I guess increasing the buffer size would be a solution here, but incase
you
wouldn't know the expected max doc size. I guess the best way tohandle
that
would be a regular try catch block in which you could commit thecurrentbuffer. At the least you could just continue the loop after doingwhatever
you wish to do using an exception handling block.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............

On Fri, Oct 3, 2008 at 1:56 PM, Aditi Goyal <[EMAIL PROTECTED]>
wrote:
Hi Everyone,
I have an index which I am opening at one time only. I keep addingthe
documents to it until I reach a limit of 500.
After this, I close the index and open it again. (This is done inorder
to
save time taken by opening and closing the index)
Also, I have set setRAMBufferSizeMB to 16MB.
If the document size itself is greater than 16MB what will happenin this
case??
It is throwing
java.lang.OutOfMemoryError: Java heap space
Now, my query is,
Can we change something in the way we parse/index to make it morememory
friendly so that it doesnt throw this exception.
And, Can it be caught and overcome gracefully?


Thanks a lot
Aditi



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Document larger than setRAMBufferSizeMB()

Reply via email to