Hello Lucene Experts, I wonder if someone might be able to shed some insight on this interesting scoring question: The problem: Build a search query that will return [ordered] hits by the top number of occurences of field values across matched documents (or as close to this as possible). The built-in scoring is great for scoring number of hits within a document, but is there an efficient way to do this across the same field in a set of matched documents? (maybe scoring isn't the best way?) Example: Let's say you have an index containing book information. Each document has a 'title' field. Let's say the index contains 100 entries, with: 65 'title's containing the word 'tiger' 21 containing 'lion' 6 containing 'panther' 5 containing 'kitten' 3 containing 'slug' What would be the best way to build a query such that returned documents are ordered in this way: Rank Value Occurences ================================ 1 tiger 65 2 lion 21 3 panther 6 4 kitten 5 5 slug 3 I can, of course, build a standard query, traverse the returned documents and build such a list, but if the returned query had many 100,000's of hits, the performance would degrade linearly, particularly if only the 'Top 5' are actually required.
One idea is to maintain a separate index with this information - the main problem with this is that you essentially need to know what you're searching for at index-time, which isn't ideal. Has anyone come across and solved this particular issue using Lucene? Many thanks, Peter _________________________________________________________________ Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy http://clk.atdmt.com/UKM/go/186394592/direct/01/