Looking For Tokenizer With Custom Delimeter

2018-01-08 Thread Armins Stepanjans
Hi, I am looking for a tokenizer, where I could specify a delimiter by which the words are tokenized, for example if I choose the delimiters as ' ' and '_' the following string: "foo__bar doo" would be tokenized into: "foo", "", "bar", "doo" (The analyzer could further filter empty tokens, since h

Re: Looking For Tokenizer With Custom Delimeter

2018-01-08 Thread Armins Stepanjans
Thanks, I was able to use the module, however my Analyzer is not invoked upon the IndexWriter.addDocument(), even thought I pass it to constructor upon creating IndexWriterConfig and when I test the Analyzer, by calling it explicitly using the instructions in http://lucene.apache.org/core/7_1_0/cor

RE: Looking For Tokenizer With Custom Delimeter

2018-01-08 Thread Uwe Schindler
Hi It is part of the analyzers-common module, it is not included in Lucene's core. Lucene's core module only has a single analyzer (StandardAnalyzer) and some helper classes, but not the full set of multi-purpose and language specific ones. Uwe - Uwe Schindler Achterdiek 19, D-28357 Breme

Re: Looking For Tokenizer With Custom Delimeter

2018-01-08 Thread Armins Stepanjans
Thanks for the solution, however I am unable to access CharTokenizer class, when I import it using: import org.apache.lucene.analysis.util.*; Although I am able to access classes directly under analysis (or analysis.standard) just fine with the import statement: import org.apache.lucene.analysis.

RE: High CPU usage observed while searching with lucene 6.2.1

2018-01-08 Thread jayanpraman
Thanks Uwe for the info. Correct, It make sense. -Jayan -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands,

RE: Looking For Tokenizer With Custom Delimeter

2018-01-08 Thread Uwe Schindler
Moin, Plain easy to do customize with lambdas! E.g., an elegant way to create a tokenizer which behaves exactly as WhitespaceTokenizer and LowerCaseFilter is: Tokenizer tok = CharTokenizer.fromSeparatorCharPredicate(Character::isWhitespace, Character::toLowerCase); Adjust with Lambdas and you

Looking For Tokenizer With Custom Delimeter

2018-01-08 Thread Armins Stepanjans
Hi, I am looking for a tokenizer, where I could specify a delimiter by which the words are tokenized, for example if I choose the delimiters as ' ' and '_' the following string: "foo__bar doo" would be tokenized into: "foo", "", "bar", "doo" (The analyzer could further filter empty tokens, since h