Re: performance implications for an index with large number of documents.

Michael D. Curtin Tue, 24 Jan 2006 05:38:27 -0800

Hi Ori,

Before taking drastic rehosting measures, and introducing the associatedsoftware complexity off splitting your application into pieces runningon separate machines, I'd recommend looking at the way your documentdata is distributed and the way you're searching them. Here are somequestions that may help you find a less-complex solution:

- Is your high ratio of unique terms to documents due to a uniqueidentifier in the documents? If so, are you performing wildcard orrange searches on that field?

- Are your queries "canned", i.e. hard-coded in form, or are they "adhoc", coming from users?

- Do your queries refer to every field you've indexed? On a similarnote, does your application use every field you've indexed or stored inLucene?

- How many documents do your queries hit typically? How many of thosehits do you typically use?

- How important is it that queries are run on up-to-the-second data?In other words, would the hits be pretty much as useful if the updateswere batched up for a few runs per day, instead of continuous?

One of the things I really like about Lucene is that one can quicklywhip up an application and it basically works. But, like mostdatabases, small differences in organization can producedisproportionately large differences in performance when there aremillions of rows/records/entries. A little time spent examining datadistribution and access patterns can go a long way.


Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: performance implications for an index with large number of documents.

Reply via email to