[
https://issues.apache.org/jira/browse/LUCENE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352539#comment-15352539
]
Uwe Schindler commented on LUCENE-7355:
---------------------------------------
bq. I don't think we need the low-level TokenStream API as a return value for
this new method, so maybe we could make it just String normalize(String field,
String text). That would probably make it easier to use?
I was thinking about the same. Then we won't even need a KeywordTokenizer! We
could just populate the termAttribute with the full term and call the filters.
This would allow to remove the dependency to analysis-common from Analyzer
(core). Just use the one from the document/field API to generate a single-value
tokenstream (we use it for non-tokenized fields). Of course this can only work
if the tokenfilters don't split terms, which a multi-term aware filter should
never do.
These are just thoughts! We can implement the normalize method (like
tokenStream) final taking a string and returning a string.
> Leverage MultiTermAwareComponent in query parsers
> -------------------------------------------------
>
> Key: LUCENE-7355
> URL: https://issues.apache.org/jira/browse/LUCENE-7355
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7355.patch, LUCENE-7355.patch
>
>
> MultiTermAwareComponent is designed to make it possible to do the right thing
> in query parsers when in comes to analysis of multi-term queries. However,
> since query parsers just take an analyzer and since analyzers do not
> propagate the information about what to do for multi-term analysis, query
> parsers cannot do the right thing out of the box.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]