[
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651403#comment-16651403
]
Alan Woodward commented on LUCENE-8497:
---------------------------------------
Thanks for the pull request [~mayyas]! I've extended it to cover Solr as well,
which allows us to remove MultiTermAwareComponent entirely.
This causes some test failures in Solr's MultiTermTest, but seeing as these
explicitly testing implementation (which has now changed), and the behaviour
itself is tested elsewhere in eg TestFoldingMultitermQuery I think we should be
OK to just remove this test? [~erickerickson] you wrote the tests originally,
does that sound reasonable to you?
> Rethink multi-term analysis handling
> ------------------------------------
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8497.patch
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof
> checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself
> deals in AbstractAnalysisComponents, and so callers need to cast to the
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts. One
> possibility is to add `normalize` methods to CharFilterFactory and
> TokenFilterFactory that mirror their existing `create` methods. The default
> implementation would return the input unchanged, while filters that should
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which
> combines tokenization and normalization in a way that will break this API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]