Hi,
I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
indexed as "er l n", some characters were trimmed while indexing.
Here is my code
protected Analyzer.TokenStreamComponents createComponents(final String
fieldName, final Reader reader)
{
final ClassicTokenizer src = new ClassicTokenizer(getVersion(),
reader);
src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
TokenStream tok = new ClassicFilter(src);
tok = new LowerCaseFilter(getVersion(), tok);
tok = new StopFilter(getVersion(), tok, stopwords);
tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive
search
return new Analyzer.TokenStreamComponents(src, tok)
{
@Override
protected void setReader(final Reader reader) throws IOException
{
src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH);
super.setReader(reader);
}
};
}
Am I missing anything? Is that expected behavior for my input or any reason
behind such abnormal behavior?
--
Regards,
Chitra