"Scott Smith" <[EMAIL PROTECTED]> wrote on 12/10/2006 14:14:57:
> Supposed I want to index 500,000 documents (average document size is > 4kBs). Let's assume I create a single index and that the index is > static (I'm not going to add any new documents to it). I would guess > the index would be around 2GB. The input data size is ~2GB but the index itself may be smaller, particularly if not storing fields/termvectors. > Now, I do searches against this on a somewhat beefy machine (2GB RAM, > Core 2 Duo, Windows XP). Does anyone have any idea what kinds of search > times I can expect for moderately complicated searches (several sets of > keywords against several fields)? Are there things I can do to increase > search performance? For example, does Lucene like lots of RAM, lots of > CPU, faster HD, all of the above? Am I better splitting the index file > into 2 (N?) versions and search on multiple indexes simultaneously? > > Anyone have any thoughts about this? Indexing time (at list for plain text or simple HTML) would be stg near half an hour, so you might just give it a try. If index size turns out to be small enough to reside in RAM (and you don't need the RAM for other activities at the same time) you could try RAMDirectory. I wonder if anyone ever compared RAMDir to a "hot" searcher above FSDir, - seems that having all the index data in RAM would be faster than relying on IO caching by the system, but if for some reason the RAMDir cannot be in RAM all the time, I would assume that paging in/out would make it more costly than using FSDir and just count on system IO caching. In the latter case see relevant discussions on warming a searcher and caching filters. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]