Re: Best practices in boosting by proximity?

Karl Wettin Sat, 04 May 2013 11:43:33 -0700

The most simple solution is to use of slop in PhraseQuery, SpanNearQuery, 
etc(?). Also consider permutations of  #isInOrder() with alternative query 
boosts.


Even though slop will create a greater score the closer the terms are, it might 
still in some cases (usually when combined with other subqueries)  make sense 
to create a BooleanQuery that contains the same query but with a greater boost 
to a smaller slop. 

You could also consider using shingles (even in combination with above) for 
matching documents where the distance between two terms are. Generally it's 
hard to define a best practice. It depends on the corpora your index 
represents, your queries and your needs.

Given your question it looks like you're using the query parser. Try something 
like "your proximity query"~20, but consider the cost of a great slop.


                karl 

4 maj 2013 kl. 19:46 skrev Gili Nachum:

> Hi. *I would like for hits that contain the search terms in proximity to
> each other to be ranked higher than hits in which the terms are scattered
> across the doc.
> Wondering if there's a best practice to achieve that?*
> I also want that all hits will contain all of the search terms (implicit
> AND):
> 
> *Example:* when users search for: "lannisters always pay their debts", the
> 4 matching results should be ranked the following (for simplicity, assume
> equal field norms, and TF/IDF, in all hits):
> 1. "It is known that *Lannisters always pay their debts*"
> 2. "... Lannisters ... they sometimes *pay their debts* ... always with you"
> 3. *"Lannisters always *win ... debts ... pay tax ... their nature"
> 4. "Lannisters ... always ... pay ... their ... debts"
> 
> The first result has all 5 terms in proximity to each other.
> The second has 3 terms in proximity.
> The third has 2 terms in proximity.
> The forth has none of the terms in proximity to each other.
> 
> My current AND query that ignores proximity is: +lannisters +always +pay
> +their +debts
> So if there are M terms, I was thinking that I could add M-1 SHOULD phrase
> queries to the original query:
> "lannisters always" "always pay" "pay their" "their debts".
> 
> What are the pros and cons? Are there alternatives to consider?
> Any Lucene class that helps achieve this?
> 
> Thx!


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Best practices in boosting by proximity?

Reply via email to