In case it's of interest, I have a new approach I'm considering. For the basic intuition, a colleague who works with some of the users formulating these complicated queries proposed that
(merger and agreement) w/5 (medical and companion) is approximately the same as (merger w/5 agreement) w/5 (medical w/5 companion) That is * and-inside-proximity can become SpanNears (agreeing with minimal interval semantics) * you can propagate the slop value (5) down from parent queries to child queries (a new idea) Alternatively, if you insist that query merger w/5 (medical and agreement) should match document "medical x x x merger x x x agreement" then you can propagate 2x the parent's slop value down to child queries. At first I thought this couldn't possibly work, but it's growing on me. The advantages: * It results seem to feel right-ish in a lot of cases. (I say "right-ish" in order to grant that there are individual specialized cases [e.g. a w/5 (b and c)] where something else seems more right. Nonetheless a lot of those specialized rules seem hard to generalize into something universally great.) * Like the qsol and interval semantics approaches (it's a variant of the latter, I guess), it can assign a meaning to all boolean-inside-proximity queries, rather than just a subset * You get a nice 1-to-1 mapping from one subquery to one SpanQuery; the output query trees look recognizably parallel to the input queries, regardless of query complexity. To be a little more precise, I have a hypothetical function "expand" that carries out the translation recursively: expand(x1 w/n x2, n) -> SpanNear(n, expand(x1, n), expand(x2, n)) expand(x1 or x2, n) -> SpanOr(n, expand(x1, n), expand(x2, n)) expand(x1 and x2, n) -> SpanNear(n, expand(x1, n), expand(x2, n)) expand(x1 not x2, n) -> expand(x1, n) not/n expand(x2, n) expand(term, n) -> SpanTermQuery n is the proximity value passed in from the parent clause. (For clarify of exposition, I've removed the rules that deal with non-proximity portions of the query, and considered only binary versions of and/or.) a not/n b means "a, not within n words of b". I don't think it can be implemented directly using existing SpanQueries, but I think it's probably easy to extend SpanQuery to do the job. On Wed, May 16, 2012 at 2:11 PM, Chris Harris <rygu...@gmail.com> wrote: > I'm working on a product for librarians and similar people, who > apparently expect to be able to combine classic boolean operators > (i.e. AND, OR, NOT) with proximity operators (especially w/n and pre/n > -- which basically map to unordered and ordered SpanQueries with slop > n, respectively) in unrestricted fashion. For example, users appear to > believe that not only are relatively easy-to-grasp queries like these > legitimate: > > medical w/5 agreement > (medical w/5 agreement) and (doctor w/10 rights) > > but also crazier ones, perhaps like > > agreement w/5 (medical and companion) > (dog or dragon) w/5 (cat and cow) > (daisy and (dog or dragon)) w/25 (cat not cow) > > What I've noticed is that it's not always obvious how to interpret > such queries; it's not always obvious what the user had in mind, nor > how you might construct a Lucene query to carry out the user's intent. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org