[
https://issues.apache.org/jira/browse/LUCENE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950677#comment-15950677
]
Robert Muir commented on LUCENE-7762:
-------------------------------------
We did a lot of work to remove analyzer customizations and options. Really they
should be "examples" and you should use CustomAnalyzer if you want to tweak
behavior.
Otherwise we run into lots of backwards-compatibility issues. Or cases like
this one, why should EnglishAnalyzer's api be bound to StandardTokenizer at
all? It should not show its cards, these things make it hard/impossible to
improve it later. And why just EnglishAnalyzer? If its gonna show its cards,
why shouldnt all the other StandardTokenizer-using analyzers show their cards
too? I think consistency is important.
these analyzers are still defined with java code (versus configuration), but
this is also not good. Such options make it hard to improve them from that
perspective too.
And really the only reason a setter is wanted is because they are defined with
java code today. If they weren't, be honest, you'd just tweak the configuration.
I'm not sure we should do this for all these reasons.
> Add EnglishAnalyzer.setMaxTokenLength
> -------------------------------------
>
> Key: LUCENE-7762
> URL: https://issues.apache.org/jira/browse/LUCENE-7762
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master (7.0), 6.6
>
>
> I think EnglishAnalyzer should also let you change the default (255) max
> token length of the StandardTokenizer its invoking.
> I will also fold the javadoc fixes from LUCENE-7760 here.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]