Hi, just sharing some personal experiences in this domain,
We performed some benchmarks in a similar setup (indexing millions of documents with thousands of fields) to measure the impact of large number of fields on a Lucene index. We observed that more you have fields, more the dictionary will become large. In fact, Lucene is creating term index by concatenating the field name with the terms associated to this field. In the worst case scenario (when terms occur in every field), you can have M fields * N terms entries in the dictionary. As a consequence, a term lookup in the dictionary will take longer. This can have a significant impact when the cache is cold. When the cache is warm (when parts of the dictionary are in memory), the time overhead is not significant, even null.
If, in your data collection, the majority of the terms are locals to one field (i.e., when a term occurs only in one single field, like for example the timestamp of the document), your dictionary will not grow that much and therefore you will probably not notice the increase in time of dictionary lookups.
And, as erick explained previously, its depends also of the size of your index, and of your use case. If it is relatively small, the overhead will be imperceptible. However, if your index is large (millions of documents), you will probably notice the overhead the first time the query is executed (cold-cache).
I haven't tested with Lucene 3.0, but I have read somewhere (correct me if I am wrong) that this new version includes some optimisations for dictionary lookups, which should minimize the overhead.
-- Renaud Delbru On 30/12/09 16:18, Jason Tesser wrote:
I have a situation where I might have 1000 different types of Lucene Documents each with 10 or so fields with different names that get indexed. I am wondering if this is bad to do within Lucene. I end up with 10,000 fields within the index although any given document has only 10 or so. I was hoping not to have to have many indexes under the covers if I can avoid it but I don't want performance to suffer either. Any thoughts? Thanks, Jason Tesser dotCMS Lead Development Manager 1-305-858-1422 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org