Re: Lucene Indexing out of memory

ajay_gupta Sun, 14 Mar 2010 23:52:18 -0700

Erick,
I did get some hint for my problem. There was a bug in the code which was
eating up the memory which I figured out after lot of effort. 
Thanks All of you for your suggestions. 
But I still feel it takes lot of time to index documents. Its taking around
an hour or more for indexing 330 MB file. (90 k documents). I am not sure
how much time it should take but I feel its slow ,  I am using FileDirectory
to store indices.


Regards
Ajay




Erick Erickson wrote:
> 
> Interpolating from your data (and, by the way, some code
> examples would help a lot), if you're reopening the index
> reader to pick up recent additions but not closing it if a
> different one is returned from reopen, you'll consume
> resources. From the JavaDocs...
> 
>  IndexReader new = r.reopen();
>  if (new != reader) {
>    ...     // reader was reopened
>    reader.close();
>  }
>  reader = new;
> 
> 
> On Wed, Mar 3, 2010 at 7:20 AM, Ian Lea <ian....@gmail.com> wrote:
> 
>> Lucene doesn't load everything into memory and can carry on running
>> consecutive searches or loading documents for ever without hitting OOM
>> exceptions.  So if it isn't failing on a specific document the most
>> likely cause is that your program is hanging on to something it
>> shouldn't. Previous docs? File handles?  Lucene readers/searchers?
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta <ajay...@gmail.com> wrote:
>> >
>> > Ian,
>> > OOM exception point varies not fixed. It could come anywhere once
>> memory
>> > exceeds a certain point.
>> > I have allocated 1 GB memory for JVM. I haven't used profiler.
>> > When I said after 70 K docs it fails i meant approx 70k documents but
>> if
>> I
>> > reduce memory then it will OOM before 70K so its not specific to any
>> > particular document.
>> > To add each document first I search and then do update so I am not sure
>> > whether lucene loads all the indices for search and thats why its going
>> OOM
>> > ? I am not sure how search operation works in Lucene.
>> >
>> >
>> > Thanks
>> > Ajay
>> >
>> >
>> > Ian Lea wrote:
>> >>
>> >> Where exactly are you hitting the OOM exception?  Have you got a stack
>> >> trace?  How much memory are you allocating to the JVM? Have you run a
>> >> profiler to find out what is using the memory?
>> >>
>> >> If it runs OK for 70K docs then fails, 2 possibilities come to mind:
>> >> either the 70K + 1 doc is particularly large, or you or lucene
>> >> (unlikely) are holding on to something that you shouldn't be.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay...@gmail.com> wrote:
>> >>>
>> >>> Hi Erick,
>> >>> I tried setting setRAMBufferSizeMB  as 200-500MB as well but still it
>> >>> goes
>> >>> OOM error.
>> >>> I thought its filebased indexing so memory shouldn't be an issue but
>> you
>> >>> might be right that when searching it might be using lot of memory ?
>> Is
>> >>> there way to load documents in chunks or someothere way to make it
>> >>> scalable
>> >>> ?
>> >>>
>> >>> Thanks in advance
>> >>> Ajay
>> >>>
>> >>>
>> >>> Erick Erickson wrote:
>> >>>>
>> >>>> I'm not following this entirely, but these docs may be huge by the
>> >>>> time you add context for every word in them. You say that you
>> >>>> "search the existing indices then I get the content and append....".
>> >>>> So is it possible that after 70K documents your additions become
>> >>>> so huge that you're blowing up? Have you taken any measurements
>> >>>> to determine how big the docs get as you index more and more
>> >>>> of them?
>> >>>>
>> >>>> If the above is off base, have you tried setting
>> >>>> IndexWriter.setRAMBufferSizeMB?
>> >>>>
>> >>>> HTH
>> >>>> Erick
>> >>>>
>> >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay...@gmail.com>
>> wrote:
>> >>>>
>> >>>>>
>> >>>>> Hi,
>> >>>>> It might be general question though but I couldn't find the answer
>> yet.
>> >>>>> I
>> >>>>> have around 90k documents sizing around 350 MB. Each document
>> contains
>> >>>>> a
>> >>>>> record which has some text content. For each word in this text I
>> want
>> >>>>> to
>> >>>>> store context for that word and index it so I am reading each
>> document
>> >>>>> and
>> >>>>> for each word in that document I am appending fixed number of
>> >>>>> surrounding
>> >>>>> words. To do that first I search in existing indices if this word
>> >>>>> already
>> >>>>> exist and if it is then I get the content and append the new
>> context
>> >>>>> and
>> >>>>> update the document. In case no context exist I create a document
>> with
>> >>>>> fields "word" and "context" and add these two fields with values as
>> >>>>> word
>> >>>>> value and context value.
>> >>>>>
>> >>>>> I tried this in RAM but after certain no of docs it gave out of
>> memory
>> >>>>> error
>> >>>>> so I thought to use FSDirectory method but surprisingly after 70k
>> >>>>> documents
>> >>>>> it also gave OOM error. I have enough disk space but still I am
>> getting
>> >>>>> this
>> >>>>> error.I am not sure even for disk based indexing why its giving
>> this
>> >>>>> error.
>> >>>>> I thought disk based indexing will be slow but atleast it will be
>> >>>>> scalable.
>> >>>>> Could someone suggest what could be the issue ?
>> >>>>>
>> >>>>> Thanks
>> >>>>> Ajay
>> >>>>> --
>> >>>>> View this message in context:
>> >>>>>
>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html
>> >>>>> Sent from the Lucene - Java Users mailing list archive at
>> Nabble.com.
>> >>>>>
>> >>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html
>> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>>
>> >>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >>
>> >
>> > --
>> > View this message in context:
>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html
>> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27900870.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene Indexing out of memory

Reply via email to