Hi, Zero or more CharFilter(s) is the way to manipulate text before the tokenizer. I think init reader is the method you want to plug char filters. https://github.com/apache/lucene-solr/blob/master/lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java
Ahmet On Thursday, June 23, 2016 6:47 PM, Jaime <j.par...@estructure.es> wrote: Hello, I want to change the input text before tokenizing. I think I just need to use some characters as word separators, and maybe remove some others completely. I was planning to use MappingCharFilterFactory to replace some chars with " " and others with "", but I feel like I'm not in the right track. First, I've implemented a custom analyzer to use my custom tokenizer. My idea was to inherit from StandardTokenizer and, in setReader, calling MappingCharFilterFactory.create(reader) from within. However, setReader is final, so I can't override it. Is there a better way to do this? In any case, how should I use MappingCharFilter in case I really needed it? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org