On Sat, Jul 7, 2007 at 8:19 PM, Chun Wei Ho wrote:
> We are currently running a search service with a single Lucene index
> of about 10 GB. We would like to find out:
>
> (a) What is the usual index size of everyone else? How large have
> Lucene index gone in prodution environments, and is there
h guidance on Lucene all this time :)
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---
Not really suggestion but some points to consider.
(a) Greatly depending on your hardware, especially harddrive speed.
(b) Do you do SortBy? Each SortBy field will need an array in memory.
If no sortBy, reserve memory for about 10~15% of index size will be enough.
(c) Maybe try to split by index c
We are currently running a search service with a single Lucene index
of about 10 GB. We would like to find out:
(a) What is the usual index size of everyone else? How large have
Lucene index gone in prodution environments, and is there a sort of a
optimal size that Lucene indexes should be?
(b)
Hi Edward,
We have indexed the MedLine data. We used the default StopAnalyzer on
the full text fields (fields that are more than just dates or ids) and
the default Keyword for the other fields. So the index has the short
fields stored in it and just indexing for the larger fields. In our
a
I'm investigating possible alternatives for indexing/searching a very
large dataset (2TB) of xml data from the pubmed database[1]. Does
anyone have any experience working with indexes of this size? Granted
the actual index size would be smaller than the source files, but I'm
just curious h
We're using a single dual-3Ghz Xeon box, Sun vx65 - indexes stored on Netapp
nearstore R100. I think you can either try to investigate if there's a way
your users will naturally group their searches and build indexes around that
to minimize individual index size or prototype a distributed index
Lucene is an excellent choice.
If I were you I would not store the un-searched fields in the index.
There's no clear benefit. Where you store the data depends on your needs
- I use flat files for what I'm doing - as I need them just for
display. If you need the functionality of a relational
Unfortunately our indexes will be performance sensitive. Is Lucene
still a good choice? What kind of hardware are you using?
Also what are the performance implications for having the additional
80 records in the index for just display purposes?
Thanks,
Richard Krenek
On 5/13/05, Vince Taluski
Yes, you'll be fine with 100 million, I've got a couple of non-performance
sensitive indexes that are more than double that (280M) with about 20
seachable fields as well. We get results back in the 10-20 second range
which is fine for our end users.
Vince
On 5/13/05, Richard Krenek <[EMAIL PRO
Hypothetically I have 100 million records. Each record has 100+
fields. Only 20 of those fields need to be searched on, the rest
(including the 20) are just for display purposes.
Would it be best to just add the 20 fields to the index and keep the
rest in a relational database? What affect does all
11 matches
Mail list logo