Erick, I did get some hint for my problem. There was a bug in the code which was eating up the memory which I figured out after lot of effort. Thanks All of you for your suggestions. But I still feel it takes lot of time to index documents. Its taking around an hour or more for indexing 330 MB file. (90 k documents). I am not sure how much time it should take but I feel its slow , I am using FileDirectory to store indices.
Regards Ajay Erick Erickson wrote: > > Interpolating from your data (and, by the way, some code > examples would help a lot), if you're reopening the index > reader to pick up recent additions but not closing it if a > different one is returned from reopen, you'll consume > resources. From the JavaDocs... > > IndexReader new = r.reopen(); > if (new != reader) { > ... // reader was reopened > reader.close(); > } > reader = new; > > > On Wed, Mar 3, 2010 at 7:20 AM, Ian Lea <ian....@gmail.com> wrote: > >> Lucene doesn't load everything into memory and can carry on running >> consecutive searches or loading documents for ever without hitting OOM >> exceptions. So if it isn't failing on a specific document the most >> likely cause is that your program is hanging on to something it >> shouldn't. Previous docs? File handles? Lucene readers/searchers? >> >> >> -- >> Ian. >> >> >> On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta <ajay...@gmail.com> wrote: >> > >> > Ian, >> > OOM exception point varies not fixed. It could come anywhere once >> memory >> > exceeds a certain point. >> > I have allocated 1 GB memory for JVM. I haven't used profiler. >> > When I said after 70 K docs it fails i meant approx 70k documents but >> if >> I >> > reduce memory then it will OOM before 70K so its not specific to any >> > particular document. >> > To add each document first I search and then do update so I am not sure >> > whether lucene loads all the indices for search and thats why its going >> OOM >> > ? I am not sure how search operation works in Lucene. >> > >> > >> > Thanks >> > Ajay >> > >> > >> > Ian Lea wrote: >> >> >> >> Where exactly are you hitting the OOM exception? Have you got a stack >> >> trace? How much memory are you allocating to the JVM? Have you run a >> >> profiler to find out what is using the memory? >> >> >> >> If it runs OK for 70K docs then fails, 2 possibilities come to mind: >> >> either the 70K + 1 doc is particularly large, or you or lucene >> >> (unlikely) are holding on to something that you shouldn't be. >> >> >> >> >> >> -- >> >> Ian. >> >> >> >> >> >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay...@gmail.com> wrote: >> >>> >> >>> Hi Erick, >> >>> I tried setting setRAMBufferSizeMB as 200-500MB as well but still it >> >>> goes >> >>> OOM error. >> >>> I thought its filebased indexing so memory shouldn't be an issue but >> you >> >>> might be right that when searching it might be using lot of memory ? >> Is >> >>> there way to load documents in chunks or someothere way to make it >> >>> scalable >> >>> ? >> >>> >> >>> Thanks in advance >> >>> Ajay >> >>> >> >>> >> >>> Erick Erickson wrote: >> >>>> >> >>>> I'm not following this entirely, but these docs may be huge by the >> >>>> time you add context for every word in them. You say that you >> >>>> "search the existing indices then I get the content and append....". >> >>>> So is it possible that after 70K documents your additions become >> >>>> so huge that you're blowing up? Have you taken any measurements >> >>>> to determine how big the docs get as you index more and more >> >>>> of them? >> >>>> >> >>>> If the above is off base, have you tried setting >> >>>> IndexWriter.setRAMBufferSizeMB? >> >>>> >> >>>> HTH >> >>>> Erick >> >>>> >> >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay...@gmail.com> >> wrote: >> >>>> >> >>>>> >> >>>>> Hi, >> >>>>> It might be general question though but I couldn't find the answer >> yet. >> >>>>> I >> >>>>> have around 90k documents sizing around 350 MB. Each document >> contains >> >>>>> a >> >>>>> record which has some text content. For each word in this text I >> want >> >>>>> to >> >>>>> store context for that word and index it so I am reading each >> document >> >>>>> and >> >>>>> for each word in that document I am appending fixed number of >> >>>>> surrounding >> >>>>> words. To do that first I search in existing indices if this word >> >>>>> already >> >>>>> exist and if it is then I get the content and append the new >> context >> >>>>> and >> >>>>> update the document. In case no context exist I create a document >> with >> >>>>> fields "word" and "context" and add these two fields with values as >> >>>>> word >> >>>>> value and context value. >> >>>>> >> >>>>> I tried this in RAM but after certain no of docs it gave out of >> memory >> >>>>> error >> >>>>> so I thought to use FSDirectory method but surprisingly after 70k >> >>>>> documents >> >>>>> it also gave OOM error. I have enough disk space but still I am >> getting >> >>>>> this >> >>>>> error.I am not sure even for disk based indexing why its giving >> this >> >>>>> error. >> >>>>> I thought disk based indexing will be slow but atleast it will be >> >>>>> scalable. >> >>>>> Could someone suggest what could be the issue ? >> >>>>> >> >>>>> Thanks >> >>>>> Ajay >> >>>>> -- >> >>>>> View this message in context: >> >>>>> >> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html >> >>>>> Sent from the Lucene - Java Users mailing list archive at >> Nabble.com. >> >>>>> >> >>>>> >> >>>>> >> --------------------------------------------------------------------- >> >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >>> -- >> >>> View this message in context: >> >>> >> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html >> >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >>> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>> >> >>> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> >> > >> > -- >> > View this message in context: >> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html >> > Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > -- View this message in context: http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27900870.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org