I am not a huge fan of the queryparser's syntax so I have started an open source project to create a viable alternative. I could really use some helping testing it out. The more I can get it tested the better chance it has of serving the community. The parser is called Qsol. I am right up against its initial release. So far it:

offers a simple clean syntax.
allows arbitrary combinations/nesting of proximity and boolean queries.
allows special date field processing (date searches can use a constantscore range filter). other minor features ( like makeAllTermsFuzzy() to make your standard search a fuzzy search (would prob be god awful slow I know, but I have seen this option in MediaWare I think).

The first initial release (if I can get some people to take the plunge and help me test) will also include sentence/paragraph proximity search support and a goggle suggest/spell-check type function. I have roughly implemented both of these, but have not combined them into the parser yet.

I have set up a rough page with some sparse documentation for the parser at http://famestalker.com/devwiki/ You can download the jar there.

A general query parser is such a pluggable part of Lucene that it would be really nice to have a few viable options. It seems that everyone that makes one keeps it proprietary (other than Surround). Help me push this thing to a 1.0 release! It is almost there. Try it out! Keep in mind, there are probably plenty of optimizations to be had in the future.

Below is a simple syntax explanation and some sample queries.

- Mark Miller


   Order of Operations

  1. '( )' *parenthesis* : me & (him | her)
  2. '!' *and not* : mill ! bucketloader
  3. '~' *within* : score ~5 lunch : use ord to only find terms in
     order : score ord~5 lunch
  4. '&' *and* : beat & pony
  5. '|' *or* : him | her

Spaces between terms default to & but this can be changed to |

*Escape* - A '\' will escape an operator : m\&m's

*Quotes* - an in-order phrase search with or without a specified slop : "holy war sick":3 | "gimme all my cake"

*Range Queries* - a query in the form: /beingword - endword/ will perform a range search. The default search is inclusive. For an exclusive search use '--' instead of '-' : creditcard[23907094 - 23094345] | creditcard[23907094 -- 23094345]

*Wildcards* - * indicates zero or more unknowns and ? indicates a single unknown : old harr*t?n | kil?r

A wildcard query cannot begin with an unknown.

*Fuzzy Query* : a ` indicates the preceding term should be a fuzzy term : old carrot & devil` may cry

*Paragraph/Sentence Proximity Searching*

If you have enabled sentence and paragraph proximity searching then the '~' operator may also be used as '~3p' or '~5s' to perform paragraph and sentence proximity searches.

*Sample Queries:*

       example = "(good witch & "killa the willaw") ~4 scary ! man";
expected = "+(+spanNear([allFields:good, allFields:scary], 4, false) -spanNear([allFields:good, allFields:man], 4, false)) +(+spanNear([allFields:witch, allFields:scary], 4, false) -spanNear([allFields:witch, allFields:man], 4, false)) +(+spanNear([spanNear([allFields:killa, allFields:willaw], 1, true), allFields:scary], 4, false) -spanNear([spanNear([allFields:killa, allFields:willaw], 1, true), allFields:man], 4, false))";
       assertEquals(expected, parse(example));
example = "beat` old magpie`"; expected = "+allFields:beat~0.5 +allFields:old +allFields:magpie~0.5";
       assertEquals(expected, parse(example));

       example = "me \| the & test & hole";
       expected = "+allFields:me +allFields:test +allFields:hole";
       assertEquals(expected, parse(example));

       example = ""test the big search":30 & me";
expected = "+spanNear([allFields:test, allFields:big, allFields:search], 30, true) +allFields:me";
       assertEquals(expected, parse(example));

       example = "me & fox & cop";
       expected = "+allFields:me +allFields:fox +allFields:cop";
       assertEquals(expected, parse(example));
example = "date[8/5/82]";
       expected = "date:19820805";
       assertEquals(expected, parse(example));

       example = "date[> 12/31/02]";
       expected = "ConstantScore(date:[20021231-})";
       assertEquals(expected, parse(example));

       example = "date[< 03/23/2004]";
       expected = "ConstantScore(date:{-20040323])";
       assertEquals(expected, parse(example));
example = "date[3/23/2004 - 6/34/02]";
       expected = "ConstantScore(date:[20040323-20020704])";
       assertEquals(expected, parse(example));
example = "field1,field2[(search & old) ~3 horse]"; expected = "(+spanNear([field1:search, field1:horse], 3, false) +spanNear([field1:old, field1:horse], 3, false)) (+spanNear([field2:search, field2:horse], 3, false) +spanNear([field2:old, field2:horse], 3, false))";
       assertEquals(expected, parse(example));

       example = "field1[search | old ~3 horse]";
expected = "(field1:search spanNear([field1:old, field1:horse], 3, false))";
       assertEquals(expected, parse(example));

       parser.makeAllTermsFuzzy(true);
       example = "meat & old cleaver | mike ~3 (dirty man)";
expected = "(+allFields:meat~0.5 +allFields:old~0.5 +allFields:cleaver~0.5) (+spanNear([fuzzy(allFields:mike), fuzzy(allFields:dirty)], 3, false) +spanNear([fuzzy(allFields:mike), fuzzy(allFields:man)], 3, false))";
       assertEquals(expected, parse(example));
       parser.makeAllTermsFuzzy(false);
example = "goat-valley";
       expected = "spanNear([allFields:goat, allFields:valley], 1, true)";
       assertEquals(expected, parse(example));
example = "goat -- valley";
       expected = "allFields:[goat TO valley]";
       assertEquals(expected, parse(example));
example = "goat \\-- valley";
       expected = "+allFields:goat +allFields:valley";
       assertEquals(expected, parse(example));
example = "goat \\- valley";
       expected = "+allFields:goat +allFields:valley";
       assertEquals(expected, parse(example));
example = "goat - valley";
       expected = "allFields:{goat TO valley}";
       assertEquals(expected, parse(example));




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to