Lots of memory will help a lot. I have a customer of DBSight and he is
using Intel Core Duo, and configure everything in memory. The index
size is about 700M. When I checked his system's average response time,
it's 12ms! I guess you can estimate what you will get from your beefy
machine.

So it maybe a good idea to try your index in a 64bit JVM with the
whole index in memory.

For indexing, it's better to have faster disks for this IO intensive process.

Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com


On 10/12/06, Doron Cohen <[EMAIL PROTECTED]> wrote:
"Scott Smith" <[EMAIL PROTECTED]> wrote on 12/10/2006 14:14:57:

> Supposed I want to index 500,000 documents (average document size is
> 4kBs).  Let's assume I create a single index and that the index is
> static (I'm not going to add any new documents to it).  I would guess
> the index would be around 2GB.

The input data size is ~2GB but the index itself may be smaller,
particularly if not storing fields/termvectors.

> Now, I do searches against this on a somewhat beefy machine (2GB RAM,
> Core 2 Duo, Windows XP).  Does anyone have any idea what kinds of search
> times I can expect for moderately complicated searches (several sets of
> keywords against several fields)?  Are there things I can do to increase
> search performance?  For example, does Lucene like lots of RAM, lots of
> CPU, faster HD, all of the above?  Am I better splitting the index file
> into 2 (N?) versions and search on multiple indexes simultaneously?
>
> Anyone have any thoughts about this?

Indexing time (at list for plain text or simple HTML) would be stg near
half an hour, so you might just give it a try. If index size turns out to
be small enough to reside in RAM (and you don't need the RAM for other
activities at the same time) you could try RAMDirectory. I wonder if anyone
ever compared RAMDir to a "hot" searcher above FSDir, - seems that having
all the index data in RAM would be faster than relying on IO caching by the
system, but if for some reason the RAMDir cannot be in RAM all the time, I
would assume that paging in/out would make it more costly than using FSDir
and just count on system IO caching. In the latter case see relevant
discussions on warming a searcher and caching filters.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to