Hi Jake,
Many thanks for your quick reply. I shall check these out. Thanks! Peter > Date: Sun, 22 Nov 2009 09:20:24 -0800 > Subject: Re: Top field count scoring across documents > From: jake.man...@gmail.com > To: java-user@lucene.apache.org > > Peter, > > You want to do a facet query. This kind of functionality is not in > Lucene-core (sadly), but both Solr (the fully featured search application > built on Lucene) and bobo-browse (just a library, like Lucene itself) are > open-source and work with Lucene to provide faceting capabilities for you. > > -jake > > On Sun, Nov 22, 2009 at 8:42 AM, Peter 4U <pete...@hotmail.com> wrote: > > > > > Hello Lucene Experts, > > > > I wonder if someone might be able to shed some insight on this interesting > > scoring question: > > > > The problem: > > Build a search query that will return [ordered] hits by the top number of > > occurences of field values across matched documents (or as close to this as > > possible). > > The built-in scoring is great for scoring number of hits within a document, > > but is there an efficient way to do this across the same field in a set of > > matched documents? (maybe scoring isn't the best way?) > > > > Example: > > Let's say you have an index containing book information. Each document has > > a 'title' field. > > Let's say the index contains 100 entries, with: > > 65 'title's containing the word 'tiger' > > 21 containing 'lion' > > 6 containing 'panther' > > 5 containing 'kitten' > > 3 containing 'slug' > > > > What would be the best way to build a query such that returned documents > > are ordered in this way: > > Rank Value Occurences > > ================================ > > 1 tiger 65 > > 2 lion 21 > > 3 panther 6 > > 4 kitten 5 > > 5 slug 3 > > > > I can, of course, build a standard query, traverse the returned documents > > and build such a list, but if the returned query had many 100,000's of hits, > > the performance would degrade linearly, particularly if only the 'Top 5' are > > actually required. > > > > > > One idea is to maintain a separate index with this information - the main > > problem with this is that you essentially need to know what you're searching > > for at index-time, which isn't ideal. > > > > > > Has anyone come across and solved this particular issue using Lucene? > > > > Many thanks, > > Peter > > > > > > > > _________________________________________________________________ > > Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy > > http://clk.atdmt.com/UKM/go/186394592/direct/01/ > > _________________________________________________________________ Use Hotmail to send and receive mail from your different email accounts http://clk.atdmt.com/UKM/go/186394592/direct/01/