Generalized proximity query performance

Kyle Maxwell Wed, 03 Oct 2007 17:17:33 -0700

Hi again,As the subject would suggest I'm trying to implement a layer of
proximity weighting over lucene.  This has greatly increased search
relevance, but at the same time has knocked down performance by a
substantial amount (see footer).


I am using a hand rolled query of the following form (implemented with
SpanNearQuery, not a sloppy PhraseQuery):
a b c => +(a AND b AND c) OR "a b"~5 OR "b c"~5

The obvious solution, "a b c"~5, is not applicable for my issues, because I
would like to allow for the possibility that a and b are near each other in
one field, while c is in another field.

So, is there something I'm missing to make this performant?  Would a
reordering, query rewriting solution help?  If there's no solution in
existing Lucene, would anyone be interested in investigating options with
me?

-Kyle


Somewhat arbitrary benchmarks.
--------------
Before:
$ ./bench.rb "paris hilton"
0.022000   0.000000   0.022000 (  0.021000)
$ ./bench.rb "paris hilton goes to jail"
0.024000   0.000000   0.024000 (  0.024000)

After:
$> ./bench.rb "paris hilton"
0.103000   0.000000   0.103000 (  0.103000)
$> ./bench.rb "paris hilton goes to jail"
1.514000   0.000000   1.514000 (  1.513000)

Generalized proximity query performance

Reply via email to