How to retrieve distinct field matches?

Mr Plate Thu, 15 Dec 2005 17:17:18 -0800

This puzzle has been bugging me for a while; I'm hoping there's anelegant way to handle it in Lucene.


DATA DESCRIPTION:

I've got an index of over 100,000 Documents. In addition to otherfields, each of these Documents has 0 or more "category" fieldvalues. There are over 5,500 such categories (it's not a small set).Anywhere from 1 to 500+ Documents could belong to a single"category". This index does not get updated very often; anywhere fromonce a day to once a month. Indexing time is currently 15-30 minutesfrom start to finish/optimization.



PROBLEM:

I'd like to provide users a way to search these "category" values.For example, suppose the user searches for "fiction". They might seeresults of: { "fiction", "non-fiction" }. However, I'd like to dothis search as quickly and efficiently as reasonable. For example, ifthere are 500 Documents of category "fiction", and 400 of "non-fiction", I don't want to Sort and iterate through each Hit to weedout the duplicate values from my query.

For what it's worth, I imagine only 0-20 categories would match agiven query.



SIMPLEST SOLUTION I CAN THINK OF:

The best I can imagine is to maintain a separate Lucene index foreach of these category types. Each Document in this separate indexwould probably have fields of "field_name", and "field_value", andwould not contain any duplicates. For example, you might see aDocument of field_name "category" and field_value "non-fiction". Myquery would hit this second index instead, to perform these metadatasearches.

I hope that makes sense; do you know of a more elegant way to handlethis type of problem?



Thanks,

Tyler

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

How to retrieve distinct field matches?

Reply via email to