[ 
https://issues.apache.org/jira/browse/LUCENE-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459873#comment-16459873
 ] 

Shawn Heisey commented on LUCENE-7960:
--------------------------------------

I've gotten a look at the PR.

Changing the signature on an existing constructor isn't a good idea.  Lucene is 
a public API and there will be user code using that constructor that must 
continue to work if Lucene is upgraded.  We should add a new constructor and 
have the existing constructor(s) call that one with default values.

The only question about that is whether the existing constructor should be 
deprecated in stable and removed in master.  I'm not sure who to ask.

There are some variable renames.  They don't look like problems, especially 
because the visibility is private, but I'd like to get the opinion of someone 
who has deeper Lucene knowledge.

I'm having a difficult time following the modifications to the filter logic.  
Some of the modifications look like they're not directly related to 
implementing this issue, but I can't tell for sure.


> NGram filters -- add option to keep short terms
> -----------------------------------------------
>
>                 Key: LUCENE-7960
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7960
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Shawn Heisey
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When ngram or edgengram filters are used, any terms that are shorter than the 
> minGramSize are completely removed from the token stream.
> This is probably 100% what was intended, but I've seen it cause a lot of 
> problems for users.  I am not suggesting that the default behavior be 
> changed.  That would be far too disruptive to the existing user base.
> I do think there should be a new boolean option, with a name like 
> keepShortTerms, that defaults to false, to allow the short terms to be 
> preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to