[ 
https://issues.apache.org/jira/browse/LUCENE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352511#comment-15352511
 ] 

Adrien Grand commented on LUCENE-7355:
--------------------------------------

We want a way to tell the analyzer to normalize a piece of text, so it should 
not tokenize (this is why it replaces the tokenizer) and apply all 
normalization filters (lowercasing, ascii folding, etc.) but not 
transformations (stop word removal, stemming, etc.). I don't think we can do it 
without adding a new API to the Analyzer class (or at least a parameter to an 
existing method)? The main use-case is the parsing of multi-term queries in 
query parsers. Once we have such an API, query parsers would not need the 
{{lowercaseExpandedTerms}} parameter as they could directly use this new method 
that would do the right thing out of the box, including not only lowercasing 
but also eg. ascii folding, which is something that there is no way to do 
currently. Now that I am thinking about it more, I don't think we need the 
low-level TokenStream API as a return value for this new method, so maybe we 
could make it just {{String normalize(String field, String text)}}. That would 
probably make it easier to use?

> Leverage MultiTermAwareComponent in query parsers
> -------------------------------------------------
>
>                 Key: LUCENE-7355
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7355
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7355.patch, LUCENE-7355.patch
>
>
> MultiTermAwareComponent is designed to make it possible to do the right thing 
> in query parsers when in comes to analysis of multi-term queries. However, 
> since query parsers just take an analyzer and since analyzers do not 
> propagate the information about what to do for multi-term analysis, query 
> parsers cannot do the right thing out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to