Hi Ori,
Before taking drastic rehosting measures, and introducing the associated
software complexity off splitting your application into pieces running
on separate machines, I'd recommend looking at the way your document
data is distributed and the way you're searching them. Here are some
questions that may help you find a less-complex solution:
- Is your high ratio of unique terms to documents due to a unique
identifier in the documents? If so, are you performing wildcard or
range searches on that field?
- Are your queries "canned", i.e. hard-coded in form, or are they "ad
hoc", coming from users?
- Do your queries refer to every field you've indexed? On a similar
note, does your application use every field you've indexed or stored in
Lucene?
- How many documents do your queries hit typically? How many of those
hits do you typically use?
- How important is it that queries are run on up-to-the-second data?
In other words, would the hits be pretty much as useful if the updates
were batched up for a few runs per day, instead of continuous?
One of the things I really like about Lucene is that one can quickly
whip up an application and it basically works. But, like most
databases, small differences in organization can produce
disproportionately large differences in performance when there are
millions of rows/records/entries. A little time spent examining data
distribution and access patterns can go a long way.
Good luck!
--MDC
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]