DictionaryCompoundWordTokenFilter does not properly add tokens from the end
compound word.
------------------------------------------------------------------------------------------
Key: LUCENE-3417
URL: https://issues.apache.org/jira/browse/LUCENE-3417
Project: Lucene - Java
Issue Type: Bug
Components: modules/analysis
Affects Versions: 3.3, 4.0
Reporter: Njal Karevoll
Due to an off-by-one error, a subword placed at the end of a compound word will
not get a token added to the token stream.
Example:
Dictionary: {"ab", "cd", "ef"}
word: "abcdef"
Created tokens: {"abcdef", "ab", "cd"}
Expected tokens: {"abcdef", "ab", "cd", "ef"}
Additionally, it could produce tokens that were shorter than the minSubwordSize
due to another off-by-one error.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]