Lucene is an excellent choice.

If I were you I would not store the un-searched fields in the index. There's no clear benefit. Where you store the data depends on your needs - I use flat files for what I'm doing - as I need them just for display. If you need the functionality of a relational database, then that's a perfectly acceptable solution as well.

I have roughly 100,000 records currently - where each "record" is a full text page from a book. I use a dual processor Pentium 4. I get large result sets (10,000 hits) back in under 1/10 of a second. I've given no thought what-so-ever to keeping my code tight or controlled or efficient - and I'm still getting great results.

Richard Krenek wrote:

Unfortunately our indexes will be performance sensitive. Is Lucene
still a good choice?  What kind of hardware are you using?

Also what are the performance implications for having the additional
80 records in the index for just display purposes?

Thanks,
Richard Krenek



On 5/13/05, Vince Taluskie <[EMAIL PROTECTED]> wrote:


Yes, you'll be fine with 100 million, I've got a couple of non-performance
sensitive indexes that are more than double that (280M) with about 20
seachable fields as well.  We get results back in the 10-20 second range
which is fine for our end users.

Vince


On 5/13/05, Richard Krenek <[EMAIL PROTECTED]> wrote:


Hypothetically I have 100 million records. Each record has 100+
fields. Only 20 of those fields need to be searched on, the rest
(including the 20) are just for display purposes.
Would it be best to just add the 20 fields to the index and keep the rest in a relational database? What affect does all that fluff data
have on the index size and search speeds? Does it matter that some of
the fluff data is repeated a lot. (certain fields might just contain
state a person lives, the color of their hair, number of fingers, etc).
Our indexes are going to be very big, 100 million+ is not an
exageration. Will Lucene handle this ok? I have created indexes in the
8-30 million range, but never this big in the number of documents and
also the number of fields.


Thanks for any info you can provide.




---------------------------------------------------------------------


To unsubscribe, e-mail:


[EMAIL PROTECTED]


For additional commands, e-mail:


[EMAIL PROTECTED]





--

@work @home

vince.taluskie (at) cexp.com vince (at) taluskie.com
Corporate Express; Technical Architect Louisville, CO
Phone: 303 664 2660 http://www.taluskie.com





--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





-- Dan Funk Software Engineer

Information Technology Solutions
Battelle Charlottesville Operations
1000 Research Park Boulevard, Suite 105
Charlottesville, Virginia 22911

434.984.0951 x244
434.984.0947 (fax)
[EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to