[
https://issues.apache.org/jira/browse/LUCENE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352511#comment-15352511
]
Adrien Grand commented on LUCENE-7355:
--------------------------------------
We want a way to tell the analyzer to normalize a piece of text, so it should
not tokenize (this is why it replaces the tokenizer) and apply all
normalization filters (lowercasing, ascii folding, etc.) but not
transformations (stop word removal, stemming, etc.). I don't think we can do it
without adding a new API to the Analyzer class (or at least a parameter to an
existing method)? The main use-case is the parsing of multi-term queries in
query parsers. Once we have such an API, query parsers would not need the
{{lowercaseExpandedTerms}} parameter as they could directly use this new method
that would do the right thing out of the box, including not only lowercasing
but also eg. ascii folding, which is something that there is no way to do
currently. Now that I am thinking about it more, I don't think we need the
low-level TokenStream API as a return value for this new method, so maybe we
could make it just {{String normalize(String field, String text)}}. That would
probably make it easier to use?
> Leverage MultiTermAwareComponent in query parsers
> -------------------------------------------------
>
> Key: LUCENE-7355
> URL: https://issues.apache.org/jira/browse/LUCENE-7355
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7355.patch, LUCENE-7355.patch
>
>
> MultiTermAwareComponent is designed to make it possible to do the right thing
> in query parsers when in comes to analysis of multi-term queries. However,
> since query parsers just take an analyzer and since analyzers do not
> propagate the information about what to do for multi-term analysis, query
> parsers cannot do the right thing out of the box.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]