I realize my statement of dread may be news to some; here are my references.

QueryParser not handling queries containing AND and OR
http://issues.apache.org/jira/browse/LUCENE-167

Query Parser flags clauses with explicit OR as required when followed by explicit AND
http://issues.apache.org/jira/browse/LUCENE-218

TERM1 OR NOT TERM2 does not perform as expected (single negated queries don't work)
http://issues.apache.org/jira/browse/LUCENE-666

Don't mix operators "+", "-" with "AND", "NOT", etc.
http://issues.apache.org/jira/browse/LUCENE-72

Very interesting thread at:
http://marc.theaimsgroup.com/?l=lucene-user&m=107096388328864&w=2


"an expression without parenthesis, when interpreted, assumes terms on either side of an AND clause are compulsory terms, and any terms on either side of an OR clause are optional. However, if you combine AND and OR in an expression, the optional terms have no effect because the others are compulsory."
http://marc.theaimsgroup.com/?l=lucene-user&m=107107383315532&w=2

All open query parser issues, 19 total.
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&pid=12310110&sorter/order=DESC&sorter/field=priority&resolution=-1&component=12310234
--Renaud



----- Original Message ----- From: "Renaud Waldura" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Thursday, October 12, 2006 4:11 PM
Subject: QueryParser Is Badly Broken


I'm developing an application used by scientists -- people who have a pretty good idea of what logic is -- and they were shocked to find out that neither of these queries return the same results:

1- banana AND apple OR orange
2- banana AND (apple OR orange)
3- (banana AND apple) OR orange

I'd expect (1) to be either (2) or (3), but it turns out it's parsed as "+banana apple orange". I was rather, uh, dismayed by this find, as it doesn't seem to make sense.

I just spent half a day reading up on the various ways QueryParser is broken, by going through the bugs and the mailing-list archives. And I'm still unable to come to a conclusion. Here's where I'm at:

a- queries which mix boolean operators require strict parenthesizing to work right

b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT" and the default operator "" rarely does what you expect

   c- the stock QueryParser doesn't work well in these cases

d- there's a new PrecedenceQueryParser at http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/miscellaneous that solves *some* of the issues but creates others

e- there is a non-Lucene effort to create a query parser with a different syntax at http://famestalker.com/devwiki/

While we are also developing a query-building UI, users must be able to enter text queries as well. What do other folks do? I mean, this is pretty bad. I can hardly go back to my scientists and tell them Lucene is unable to handle 2 boolean operators, that they should parenthesize everything by hand. I mean, that's just cheesy.

--Renaud




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to