Hi, Have you looked at using the HitCollector?
Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Jesse Prabawa <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, June 20, 2007 9:17:22 AM Subject: Re: Position of matches to affect scoring Hi Steve, Thanks for the advice and your detailed explanation. I have another question though, I understand that Lucene normalizes the scores based on field length. Is there a way for me to avoid this? Or perhaps have a better control of how the scores are normalized. Best regards, Jes On 6/19/07, Steven Rowe <[EMAIL PROTECTED]> wrote: > > Hi Jes, > > Jesse Prabawa wrote: > > The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ > > mentions that the position of the matches in the text does not affect > > scoring. So is there anyway that I can make the position of the > > matches affect scoring? For example, I want matches that occur at the > > beginning to weigh more than those that occur elsewhere in the text. > > I have just started using Lucene so any help/advice is greatly > > appreciated :) > > One quick way to get (something like) what you want is to place "the > beginning" in a separate field from the rest of the document contents, > then query both the "beginning" and "remainder" fields with the same > query, boosting (i.e. weighting) the "beginning" field higher than the > "remainder" field. > > E.g. (assumes SimpleAnalyzer, and default "OR" QueryParser operator): > > doc1: "This is the inception. Here is the rest." > "beginning" field: "this", "is", "the", "inception" > "remainder" field: "here", "is", "the", "rest" > > doc2: "Something else here. After the inception." > "beginning" field: "something", "else", "here" > "remainder" field: "after", "the", "inception" > > query: "What does inception mean?" > -> "beginning:(what does inception mean)^5 remainder:(what does > inception mean)^1" > > The transformed query shown above is how it would look in QueryParser > syntax[1] to query both fields with the same query, while boosting the > "beginning" field higher (boost:5) than the "remainder" field (boost:1). > > You have to build this transformed query yourself - there is no facility > in Lucene (that I'm aware of) for building multi-field queries with > differently boosted fields. > > Both docs will match, but doc1 will score higher than doc2, since > "inception" is in doc1's higher-weighted "beginning" field. > > > Steve > > [1] http://lucene.apache.org/java/docs/queryparsersyntax.html > > -- > Steve Rowe > Center for Natural Language Processing > http://www.cnlp.org/tech/lucene.asp > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]