Hi Pelit, My comments are inline.
On Sun, Jan 16, 2011 at 8:03 PM, Pelit Mamani <pelit.mam...@timetoknow.com>wrote: > Hi, > > I'm maintaining some Lucene-based code, and we're trying to get control > over result ordering (users aren't happy with the default). > I know how to boost a Field or Document (very useful). > But: > > > 1) Is there a way to boost "OR" queries, based on the number of > matched terms? > So the OR query "lord rings" will first show the document "LORD of the > RINGS" (which holds both words), and only later "selected jewels and RINGS" > (which only holds one word). > Is that what you call "Term Frequency"? And how do you boost it further? > I did a bit of tinkering and got the impression Lucene would boost it by > default, but not enough - it's sometimes overridden by other boosting > factors (maybe the boost for short expressions). > > The score of a document does get affected by the fraction of query terms it actually contains. See It is encapsulated in Similarity.cord()<http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html#coord%28int,%20int%29> You can override it as per your needs. 2) Is there a way to boost based on "positions"? > So "LORD of the RINGS" gets precedence over "LORD of thet funny golden > RINGS", because the search words are positioned closer to each other? > > SpanQueries support positional/proximity matches and allow you to specify slopes, but checkout if it assigns a higher score if terms are closer. Else if you are creating your queries programatically, then you can create multiple span queries with different slopes, assign different boost to them, and join them together as OR clauses. > > 3) With wildcard searches, is there a way to boost documents that hold > an exact match. > So if I search for "ring*", I first see the exact match "story of a RING", > and only later "a RINGING failure" > > Thanx a bunch. > > > > -- --- Thanks & Regards Umesh Prasad