Hi,
I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since h
Thanks, I was able to use the module, however my Analyzer is not invoked
upon the IndexWriter.addDocument(), even thought I pass it to constructor
upon creating IndexWriterConfig and when I test the Analyzer, by calling it
explicitly using the instructions in
http://lucene.apache.org/core/7_1_0/cor
Hi
It is part of the analyzers-common module, it is not included in Lucene's core.
Lucene's core module only has a single analyzer (StandardAnalyzer) and some
helper classes, but not the full set of multi-purpose and language specific
ones.
Uwe
-
Uwe Schindler
Achterdiek 19, D-28357 Breme
Thanks for the solution, however I am unable to access CharTokenizer class,
when I import it using:
import org.apache.lucene.analysis.util.*;
Although I am able to access classes directly under analysis (or
analysis.standard) just fine with the import statement:
import org.apache.lucene.analysis.
Thanks Uwe for the info. Correct, It make sense.
-Jayan
--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands,
Moin,
Plain easy to do customize with lambdas! E.g., an elegant way to create a
tokenizer which behaves exactly as WhitespaceTokenizer and LowerCaseFilter is:
Tokenizer tok =
CharTokenizer.fromSeparatorCharPredicate(Character::isWhitespace,
Character::toLowerCase);
Adjust with Lambdas and you
Hi,
I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since h