Oh I think I have found some clues at: [1] http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967 [2] http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingSimilarity
Thanks! Jes On 6/20/07, Jesse Prabawa <[EMAIL PROTECTED]> wrote:
Hi Steve, Thanks for the advice and your detailed explanation. I have another question though, I understand that Lucene normalizes the scores based on field length. Is there a way for me to avoid this? Or perhaps have a better control of how the scores are normalized. Best regards, Jes On 6/19/07, Steven Rowe <[EMAIL PROTECTED]> wrote: > > Hi Jes, > > Jesse Prabawa wrote: > > The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ > > mentions that the position of the matches in the text does not affect > > scoring. So is there anyway that I can make the position of the > > matches affect scoring? For example, I want matches that occur at the > > beginning to weigh more than those that occur elsewhere in the text. > > I have just started using Lucene so any help/advice is greatly > > appreciated :) > > One quick way to get (something like) what you want is to place "the > beginning" in a separate field from the rest of the document contents, > then query both the "beginning" and "remainder" fields with the same > query, boosting (i.e. weighting) the "beginning" field higher than the > "remainder" field. > > E.g. (assumes SimpleAnalyzer, and default "OR" QueryParser operator): > > doc1: "This is the inception. Here is the rest." > "beginning" field: "this", "is", "the", "inception" > "remainder" field: "here", "is", "the", "rest" > > doc2: "Something else here. After the inception." > "beginning" field: "something", "else", "here" > "remainder" field: "after", "the", "inception" > > query: "What does inception mean?" > -> "beginning:(what does inception mean)^5 remainder:(what does > inception mean)^1" > > The transformed query shown above is how it would look in QueryParser > syntax[1] to query both fields with the same query, while boosting the > "beginning" field higher (boost:5) than the "remainder" field (boost:1). > > You have to build this transformed query yourself - there is no facility > > in Lucene (that I'm aware of) for building multi-field queries with > differently boosted fields. > > Both docs will match, but doc1 will score higher than doc2, since > "inception" is in doc1's higher-weighted "beginning" field. > > > Steve > > [1] http://lucene.apache.org/java/docs/queryparsersyntax.html > > -- > Steve Rowe > Center for Natural Language Processing > http://www.cnlp.org/tech/lucene.asp > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >