Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

Chris Harris Fri, 25 May 2012 19:07:54 -0700

In case it's of interest, I have a new approach I'm considering.

For the basic intuition, a colleague who works with some of the users
formulating these complicated queries proposed that

(merger and agreement) w/5 (medical and companion)

is approximately the same as

(merger w/5 agreement) w/5 (medical w/5 companion)

That is
* and-inside-proximity can become SpanNears (agreeing with minimal
interval semantics)
* you can propagate the slop value (5) down from parent queries to
child queries (a new idea)

Alternatively, if you insist that query

merger w/5 (medical and agreement)

should match document "medical x x x merger x x x agreement"

then you can propagate 2x the parent's slop value down to child queries.

At first I thought this couldn't possibly work, but it's growing on me.

The advantages:
* It results seem to feel right-ish in a lot of cases. (I say
"right-ish" in order to grant that there are individual specialized
cases [e.g. a w/5 (b and c)] where something else seems more right.
Nonetheless a lot of those specialized rules seem hard to generalize
into something universally great.)
* Like the qsol and interval semantics approaches (it's a variant of
the latter, I guess), it can assign a meaning to all
boolean-inside-proximity queries, rather than just a subset
* You get a nice 1-to-1 mapping from one subquery to one SpanQuery;
the output query trees look recognizably parallel to the input
queries, regardless of query complexity.

To be a little more precise, I have a hypothetical function "expand"
that carries out the translation recursively:

expand(x1 w/n x2, n) -> SpanNear(n, expand(x1, n), expand(x2, n))
expand(x1 or x2, n) -> SpanOr(n, expand(x1, n), expand(x2, n))
expand(x1 and x2, n) -> SpanNear(n, expand(x1, n), expand(x2, n))
expand(x1 not x2, n) -> expand(x1, n) not/n expand(x2, n)
expand(term, n) -> SpanTermQuery

n is the proximity value passed in from the parent clause. (For
clarify of exposition, I've removed the rules that deal with
non-proximity portions of the query, and considered only binary
versions of and/or.)

a not/n b means "a, not within n words of b". I don't think it can be
implemented directly using existing SpanQueries, but I think it's
probably easy to extend SpanQuery to do the job.

On Wed, May 16, 2012 at 2:11 PM, Chris Harris <rygu...@gmail.com> wrote:
> I'm working on a product for librarians and similar people, who
> apparently expect to be able to combine classic boolean operators
> (i.e. AND, OR, NOT) with proximity operators (especially w/n and pre/n
> -- which basically map to unordered and ordered SpanQueries with slop
> n, respectively) in unrestricted fashion. For example, users appear to
> believe that not only are relatively easy-to-grasp queries like these
> legitimate:
>
> medical w/5 agreement
> (medical w/5 agreement) and (doctor w/10 rights)
>
> but also crazier ones, perhaps like
>
> agreement w/5 (medical and companion)
> (dog or dragon) w/5 (cat and cow)
> (daisy and (dog or dragon)) w/25 (cat not cow)
>
> What I've noticed is that it's not always obvious how to interpret
> such queries; it's not always obvious what the user had in mind, nor
> how you might construct a Lucene query to carry out the user's intent.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?

Reply via email to