On Oct 12, 2006, at 7:11 PM, Renaud Waldura wrote:
I'm developing an application used by scientists -- people who have a pretty good idea of what logic is -- and they were shocked to find out that neither of these queries return the same results:

1- banana AND apple OR orange
2- banana AND (apple OR orange)
3- (banana AND apple) OR orange

I'd expect (1) to be either (2) or (3), but it turns out it's parsed as "+banana apple orange". I was rather, uh, dismayed by this find, as it doesn't seem to make sense.

It's not news to the die hard Luceners that QueryParser is mangled. It's a kitchen sink syntax with more bells and whistles than most applications need. I've yet to come across a project that has used QueryParser as-is, not because it's "broken", but because every application has been unique in how queries are expressed by users.

a- queries which mix boolean operators require strict parenthesizing to work right

b- "+" isn't shorthand for "AND"; using it with "AND"/"OR"/"NOT" and the default operator "" rarely does what you expect

AND/OR are oddly named in terms of how they map to the underlying BooleanQuery they create. AND really means to make both clauses MUST, and OR means to make them SHOULD. And, as you've painfully experienced, the precedence is not "logical".

   c- the stock QueryParser doesn't work well in these cases

d- there's a new PrecedenceQueryParser at http://svn.apache.org/ repos/asf/lucene/java/trunk/contrib/miscellaneous that solves *some* of the issues but creates others

What issues does PQP create? Perhaps we can get those fixed and replace QueryParser with it.

While we are also developing a query-building UI, users must be able to enter text queries as well. What do other folks do? I mean, this is pretty bad. I can hardly go back to my scientists and tell them Lucene is unable to handle 2 boolean operators, that they should parenthesize everything by hand. I mean, that's just cheesy.

It really boils down to user interface, from my perspective. Do the users need to type in all of that kind of logic? Or could they be presented with a simpler syntax with just +/- in front of terms to indicate MUST/NOT (and SHOULD with no prefix)? Perhaps they could be presented with two text boxes, one for required terms, and another for optional terms (and maybe another for prohibited terms)?

We are all certainly very open to improving QueryParser, or PQP.

        Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to