I realize my statement of dread may be news to some; here are my references.
QueryParser not handling queries containing AND and OR
http://issues.apache.org/jira/browse/LUCENE-167
Query Parser flags clauses with explicit OR as required when followed by
explicit AND
http://issues.apache.org/jira/browse/LUCENE-218
TERM1 OR NOT TERM2 does not perform as expected (single negated queries
don't work)
http://issues.apache.org/jira/browse/LUCENE-666
Don't mix operators "+", "-" with "AND", "NOT", etc.
http://issues.apache.org/jira/browse/LUCENE-72
Very interesting thread at:
http://marc.theaimsgroup.com/?l=lucene-user&m=107096388328864&w=2
"an expression without parenthesis, when interpreted, assumes terms on
either side of an AND clause are compulsory terms, and any terms on either
side of an OR clause are optional. However, if you combine AND and OR in an
expression, the optional terms have no effect because the others are
compulsory."
http://marc.theaimsgroup.com/?l=lucene-user&m=107107383315532&w=2
All open query parser issues, 19 total.
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310110&sorter/order=DESC&sorter/field=priority&resolution=-1&component=12310234
--Renaud
----- Original Message -----
From: "Renaud Waldura" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Thursday, October 12, 2006 4:11 PM
Subject: QueryParser Is Badly Broken
I'm developing an application used by scientists -- people who have a
pretty good idea of what logic is -- and they were shocked to find out
that neither of these queries return the same results:
1- banana AND apple OR orange
2- banana AND (apple OR orange)
3- (banana AND apple) OR orange
I'd expect (1) to be either (2) or (3), but it turns out it's parsed as
"+banana apple orange". I was rather, uh, dismayed by this find, as it
doesn't seem to make sense.
I just spent half a day reading up on the various ways QueryParser is
broken, by going through the bugs and the mailing-list archives. And I'm
still unable to come to a conclusion. Here's where I'm at:
a- queries which mix boolean operators require strict parenthesizing to
work right
b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT" and
the default operator "" rarely does what you expect
c- the stock QueryParser doesn't work well in these cases
d- there's a new PrecedenceQueryParser at
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous
that solves *some* of the issues but creates others
e- there is a non-Lucene effort to create a query parser with a
different syntax at http://famestalker.com/devwiki/
While we are also developing a query-building UI, users must be able to
enter text queries as well. What do other folks do? I mean, this is pretty
bad. I can hardly go back to my scientists and tell them Lucene is unable
to handle 2 boolean operators, that they should parenthesize everything by
hand. I mean, that's just cheesy.
--Renaud
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]