See below: On Feb 5, 2008 9:41 AM, SK R <[EMAIL PROTECTED]> wrote:
> Hi, > Thanks for your help Erick. > > I changed my code to flush writer before document add which helps to > reduce memory usage. > Also reducing mergefactor and max buffered docs to some level help me to > avoid this OOM error (eventhough index size is ~1GB). > > But please clarify below doubts > > Make sure you flush your IndexWriter before attempting to index this > document. > > - Is it good to call writer.flush() before adding every document into > writer? Doesn't it affect performance of indexing or search? Whether it's > also similar to setting MaxBufferDocs=1? > No, this is not a good idea. I'd expect this to slow down indexing significantly. What I was assuming is that you'd have something like: if (incoming document is huge) flush index writer just to free up all the memory you can. > > Also guide me which one is relatively good (take less time & memory) > among this > (i) create 4 indexes each of 250MB and merge them to single index > file by using writer.addIndexes(..) > (ii) create a 1GB index & optimize it? > Don't know. You have to measure your particular situation. There's some discussion (search the archives) about using several threads to speed up indexing. Also, there's the wiki page, see http://wiki.apache.org/lucene-java/ImproveIndexingSpeed The first bullet point is important here. Do you really need to improve indexing speed? How long does it take and how often to you build it? But perhaps I mis-read your original post. I *thought* you were talking about indexing a 1G *document*. The size of the index shouldn't matter as far as an OOM error. But now that I re-read your original post, I should have also suggested that you optimize in different processes than you index since the implication is that they are separate indexes anyway. Best Erick > > Thanks & Regards > RSK > > > > On Feb 4, 2008 9:23 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > > > ummmm index smaller documents? <G> > > > > You cannot expect to index a 1G doc with 512M of memory in the JVM. > > The first thing I'd try is upping your JVM memory to the max your > machine > > will accept. > > > > Make sure you flush your IndexWriter before attempting to index this > > document. > > > > But I would not be surprised if this failed to solve the problem. What's > > in > > this massive document? Would it be possible to break it up into > > smaller segments and index many sub-documents for this massive doc? > > I also wonder what problem you're trying to solve by indexing this doc. > > Is it a log file? I can't imagine a text document that big. That's like > a > > 100 volume encyclopedia, and I can't help but wonder whether your users > > would be better served by indexing it in pieces. > > > > Best > > Erick > > > > On Feb 4, 2008 10:25 AM, SK R <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > I got outof memory exception while indexing huge documents (~1GB) > in > > > one thread and optimizing some other (2 to 3) indexes in different > > > threads. > > > Max JVM heap size is 512MB. I'm using lucene2.3.0. > > > > > > Please suggest a way to avoid this exception. > > > > > > Regards > > > RSK > > > > > >