Re: Preprocess input text before tokenizing

2016-06-24 Thread Ahmet Arslan
Hi Jaime, Please see o.a.l.analysis.custom.CustomAnalyzer.builder() to create custom analyzers using a builder-style API. Ahmet On Friday, June 24, 2016 10:54 AM, Jaime wrote: Thank you very much, that seems to solve my issue. However, I find this a little cumbersome. I need to filter the te

Re: Preprocess input text before tokenizing

2016-06-24 Thread Jaime
Thank you very much, that seems to solve my issue. However, I find this a little cumbersome. I need to filter the text before any tokenizing takes place, so I have to implement a filtered version of every analyzer I'm using (StandardAnalyzer and SpanishAnalyzer and a custom analyzer right now)

Re: Preprocess input text before tokenizing

2016-06-23 Thread Ahmet Arslan
Hi, Zero or more CharFilter(s) is the way to manipulate text before the tokenizer. I think init reader is the method you want to plug char filters. https://github.com/apache/lucene-solr/blob/master/lucene/analysis/morfologik/src/java/org/apache/lucene/analysis/uk/UkrainianMorfologikAnalyzer.java