[jira] Created: (LUCENE-2605) queryparser parses on whitespace

Robert Muir (JIRA) Mon, 16 Aug 2010 20:31:44 -0700

queryparser parses on whitespace
--------------------------------

                 Key: LUCENE-2605
                 URL: https://issues.apache.org/jira/browse/LUCENE-2605
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Robert Muir
             Fix For: 3.1, 4.0



The queryparser parses input on whitespace, and sends each whitespace separated 
term to its own independent token stream.

This breaks the following at query-time, because they can't see across 
whitespace boundaries:
* n-gram analysis
* shingles 
* synonyms (especially multi-word for whitespace-separated languages)
* languages where a 'word' can contain whitespace (e.g. vietnamese)

Its also rather unexpected, as users think their 
charfilters/tokenizers/tokenfilters will do the same thing at index and 
querytime, but
in many cases they can't. Instead, preferably the queryparser would parse 
around only real 'operators'.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Created: (LUCENE-2605) queryparser parses on whitespace

Reply via email to