[
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693361#comment-16693361
]
ASF subversion and git services commented on LUCENE-8497:
---------------------------------------------------------
Commit c2bd3aed22b439168fb2bfadcdcee4fed09e4ff7 in lucene-solr's branch
refs/heads/jira/http2 from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c2bd3ae ]
LUCENE-8497: Fix reference to MultiTermAwareComponenent in Solr reference guide
> Rethink multi-term analysis handling
> ------------------------------------
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Fix For: master (8.0)
>
> Attachments: LUCENE-8497.patch, LUCENE-8497.patch, LUCENE-8497.patch,
> LUCENE-8497.patch
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof
> checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself
> deals in AbstractAnalysisComponents, and so callers need to cast to the
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts. One
> possibility is to add `normalize` methods to CharFilterFactory and
> TokenFilterFactory that mirror their existing `create` methods. The default
> implementation would return the input unchanged, while filters that should
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which
> combines tokenization and normalization in a way that will break this API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]