[
https://issues.apache.org/jira/browse/LUCENE-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866916#comment-16866916
]
Mike Sokolov commented on LUCENE-8866:
--------------------------------------
+1 if people have more precise normalization requirements, they can encode them
in their dictionary – I think we can presume this is not noisy user data, and
should already have been cleaned.
> Remove ICU dependency of kuromoji tools/test-tools
> --------------------------------------------------
>
> Key: LUCENE-8866
> URL: https://issues.apache.org/jira/browse/LUCENE-8866
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Priority: Major
> Attachments: LUCENE-8866.patch
>
>
> The tooling stuff has an off-by-default option to normalize entries,
> currently using the ICU api.
> But I think since its off-by-default, and just doing NFKC normalization at
> dictionary-build-time, its a better tradeoff to use the JDK here?
> I would rather remove the ICU dependency for the tooling and look at
> simplifying the build to have less modules (e.g. investigate moving the
> tooling and tests into src/java and src/tools, so that [[email protected]]
> new tests in LUCENE-8863 are running by default, dictionary tool is shipped
> as a commandline tool in the JAR, etc)
> "ant regenerate" should be enough to prevent any chicken-and-eggs in the
> dictionary construction code, so I don't think we need separate modules to
> enforce it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]