Re: Position of matches to affect scoring

Jesse Prabawa Wed, 20 Jun 2007 00:32:42 -0700

Oh I think I have found some clues at:
[1] http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967
[2]
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/package-summary.html#changingSimilarity


Thanks!

Jes

On 6/20/07, Jesse Prabawa <[EMAIL PROTECTED]> wrote:


Hi Steve,

Thanks for the advice and your detailed explanation. I have another
question though, I understand that Lucene normalizes the scores based on
field length. Is there a way for me to avoid this? Or perhaps have a better
control of how the scores are normalized.

Best regards,

Jes

On 6/19/07, Steven Rowe <[EMAIL PROTECTED]> wrote:
>
> Hi Jes,
>
> Jesse Prabawa wrote:
> > The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
> > mentions that the position of the matches in the text does not affect
> > scoring. So is there anyway that I can make the position of the
> > matches affect scoring? For example, I want matches that occur at the
> > beginning to weigh more than those that occur elsewhere in the text.
> > I have just started using Lucene so any help/advice is greatly
> > appreciated :)
>
> One quick way to get (something like) what you want is to place "the
> beginning" in a separate field from the rest of the document contents,
> then query both the "beginning" and "remainder" fields with the same
> query, boosting (i.e. weighting) the "beginning" field higher than the
> "remainder" field.
>
> E.g. (assumes SimpleAnalyzer, and default "OR" QueryParser operator):
>
>   doc1: "This is the inception.  Here is the rest."
>         "beginning" field: "this", "is", "the", "inception"
>         "remainder" field: "here", "is", "the", "rest"
>
>   doc2: "Something else here.  After the inception."
>         "beginning" field: "something", "else", "here"
>         "remainder" field: "after", "the", "inception"
>
> query: "What does inception mean?"
> -> "beginning:(what does inception mean)^5  remainder:(what does
> inception mean)^1"
>
> The transformed query shown above is how it would look in QueryParser
> syntax[1] to query both fields with the same query, while boosting the
> "beginning" field higher (boost:5) than the "remainder" field (boost:1).
>
> You have to build this transformed query yourself - there is no facility
>
> in Lucene (that I'm aware of) for building multi-field queries with
> differently boosted fields.
>
> Both docs will match, but doc1 will score higher than doc2, since
> "inception" is in doc1's higher-weighted "beginning" field.
>
>
> Steve
>
> [1] http://lucene.apache.org/java/docs/queryparsersyntax.html
>
> --
> Steve Rowe
> Center for Natural Language Processing
> http://www.cnlp.org/tech/lucene.asp
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Position of matches to affect scoring

Reply via email to