If you're CPU-bound - I've had issues before with GC in long-running indexing 
tasks loading very large volumes (100s of millions) of docs. I was seeing lots 
of CPU usage tied up in GC.

I solved all these problems by firing batches of indexing activity off in 
seperate processes then immediately killing them after use. This is the most 
effective form of garbage collection you can get...
I don't see any advantage (other than hotspotting) of keeping any of the 
resources created during an indexing batch after that batch has been committed. 
Why burn CPU cycles garbage collecting when you know there is nothing of value 
to be retained?

Clearly this may not be an approach where batches of updates are committed very 
frequently (i.e. when measured in seconds not minutes) but for heavy ingest 
this approach can sit quite happily alongside a seperate long-running process 
used for servicing searches without loading the search process with the task of 
garbage collecting the indexing debris. The search process can periodically 
reopen IndexReader to see the new segments created in other indexing processes.

Cheers
Mark




----- Original Message ----
From: Erick Erickson <erickerick...@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, 30 April, 2009 14:46:21
Subject: Re: Indexing becomes slow with time

This is surprising behavior, which is another way of saying that,
given what you've said so far, this shouldn't be happening. I'd
really look at system metrics, like whether you're swapping
etc. In particular you might want to try varying how big you
allow your memory footprint to grow before you flush, this is
in the doc Ian pointed out under *
Flush by RAM usage instead of document count*

There's no need to periodically optimize, just do that at the end
if you must.

Best
Erick

On Thu, Apr 30, 2009 at 6:23 AM, liat oren <oren.l...@gmail.com> wrote:

> Yes, I do run optimize...
>
> I did start looking at these tips in the last few days, but didn't think
> the
> optimize makes it so slow.
>
> Thanks!
>
> 2009/4/30 Ian Lea <ian....@gmail.com>
>
> > Are you maybe running optimize after every n documents?  There are
> > lots of tips in
> > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed.
> >
> >
> > --
> > Ian.
> >
> >
> > On Thu, Apr 30, 2009 at 8:29 AM, liat oren <oren.l...@gmail.com> wrote:
> > > Hi,
> > >
> > > I noticed that when I start to index, it indexes 7 documents a second.
> > After
> > > 30 minutes it goes down to 3 documents a second.
> > > After two hours it becomes very slow (I stopped it when it arrived to
> > 320MB
> > > and did 1 document in almost a minute)
> > >
> > > As you can see, it happens only after 2000, 3000 documnet.
> > > Should I split them into more indexes?
> > >
> > >
> > > Thanks,
> > > Liat
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to