On Oct 12, 2006, at 7:11 PM, Renaud Waldura wrote:
I'm developing an application used by scientists -- people who have
a pretty good idea of what logic is -- and they were shocked to
find out that neither of these queries return the same results:
1- banana AND apple OR orange
2- banana AND (apple OR orange)
3- (banana AND apple) OR orange
I'd expect (1) to be either (2) or (3), but it turns out it's
parsed as "+banana apple orange". I was rather, uh, dismayed by
this find, as it doesn't seem to make sense.
It's not news to the die hard Luceners that QueryParser is mangled.
It's a kitchen sink syntax with more bells and whistles than most
applications need. I've yet to come across a project that has used
QueryParser as-is, not because it's "broken", but because every
application has been unique in how queries are expressed by users.
a- queries which mix boolean operators require strict
parenthesizing to work right
b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT"
and the default operator "" rarely does what you expect
AND/OR are oddly named in terms of how they map to the underlying
BooleanQuery they create. AND really means to make both clauses
MUST, and OR means to make them SHOULD. And, as you've painfully
experienced, the precedence is not "logical".
c- the stock QueryParser doesn't work well in these cases
d- there's a new PrecedenceQueryParser at http://svn.apache.org/
repos/asf/lucene/java/trunk/contrib/miscellaneous that solves
*some* of the issues but creates others
What issues does PQP create? Perhaps we can get those fixed and
replace QueryParser with it.
While we are also developing a query-building UI, users must be
able to enter text queries as well. What do other folks do? I mean,
this is pretty bad. I can hardly go back to my scientists and tell
them Lucene is unable to handle 2 boolean operators, that they
should parenthesize everything by hand. I mean, that's just cheesy.
It really boils down to user interface, from my perspective. Do the
users need to type in all of that kind of logic? Or could they be
presented with a simpler syntax with just +/- in front of terms to
indicate MUST/NOT (and SHOULD with no prefix)? Perhaps they could be
presented with two text boxes, one for required terms, and another
for optional terms (and maybe another for prohibited terms)?
We are all certainly very open to improving QueryParser, or PQP.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]