Lots of memory will help a lot. I have a customer of DBSight and he is using Intel Core Duo, and configure everything in memory. The index size is about 700M. When I checked his system's average response time, it's 12ms! I guess you can estimate what you will get from your beefy machine.
So it maybe a good idea to try your index in a 64bit JVM with the whole index in memory. For indexing, it's better to have faster disks for this IO intensive process. Chris Lu ------------------------- Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 10/12/06, Doron Cohen <[EMAIL PROTECTED]> wrote:
"Scott Smith" <[EMAIL PROTECTED]> wrote on 12/10/2006 14:14:57: > Supposed I want to index 500,000 documents (average document size is > 4kBs). Let's assume I create a single index and that the index is > static (I'm not going to add any new documents to it). I would guess > the index would be around 2GB. The input data size is ~2GB but the index itself may be smaller, particularly if not storing fields/termvectors. > Now, I do searches against this on a somewhat beefy machine (2GB RAM, > Core 2 Duo, Windows XP). Does anyone have any idea what kinds of search > times I can expect for moderately complicated searches (several sets of > keywords against several fields)? Are there things I can do to increase > search performance? For example, does Lucene like lots of RAM, lots of > CPU, faster HD, all of the above? Am I better splitting the index file > into 2 (N?) versions and search on multiple indexes simultaneously? > > Anyone have any thoughts about this? Indexing time (at list for plain text or simple HTML) would be stg near half an hour, so you might just give it a try. If index size turns out to be small enough to reside in RAM (and you don't need the RAM for other activities at the same time) you could try RAMDirectory. I wonder if anyone ever compared RAMDir to a "hot" searcher above FSDir, - seems that having all the index data in RAM would be faster than relying on IO caching by the system, but if for some reason the RAMDir cannot be in RAM all the time, I would assume that paging in/out would make it more costly than using FSDir and just count on system IO caching. In the latter case see relevant discussions on warming a searcher and caching filters. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]