Lucene scales with the number of unique terms in the index and not the
number of documents nor the size of the documents. Typically, you
should have at most 1 million unique terms for a set of 10 million
documents.

So the fact that you have 13 million unique terms in 10 million
documents tells me that the characteristics of your document set don't
follow the typical term growth rate that Lucene is designed for. You
might actually be better off using a database for storing and
searching these documents.

On 1/24/06, Ori Schnaps <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Apologies if this question has being asked before on this list.
>
> I am working on an application with a Lucene index whose performance
> (response time for a query) has started degrading as its size has
> increase.
>
> The index is made up of approximately 10 million documents that have
> 11 fields.  The average document size is less then 1k.  The index has
> a total of 13 million terms.  The total index size is about 2.2 gig.
> The index is being updated relatively aggressively.  In a 24hr period
> there may be any where from 500k to 3 million updates.
>
> What I have noticed is that as the document number increased from 6
> million to 10 million, the response time for a query has continually
> increased from 0.5 seconds to ~2+ seconds.
>
> I am using a Java j2sdk1.4.2_08 and Lucene 1.4.3.  The container is
> tomcat and the java process is allocated 2gig heap.  The heap is
> shared between the Lucene index and the end user application.
>
> Our initial inclination is to pull the index out of the application
> and load it fully into memory on a separate box.  Anyone have any
> experience with performance for index of this nature and what kind of
> request time should be expected from Lucene?
>
> thanks
> ori
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Dave Kor, Research Assistant
Center for Information Mining and Extraction
School of Computing
National University of Singapore.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to