On Fri, Aug 21, 2009 at 2:18 PM, Valery<khame...@gmail.com> wrote: > > > Simon Willnauer wrote: >> >> I already responded... again... >> > sorry, I've been in answering and seen your post right after sending. > > > Simon Willnauer wrote: >> >> Tokenizer splits the input stream into tokens (Token.java) and >> TokenFilter subclasses operate on those. I expect from a Tokenizer >> that is provides me a stream of tokens :) - how those tokens are >> created is the responsibility of the Tokenizer. > > According to your requirements: > > * one programmer will write a simplistic Tokenizer that converts a whole > char input into a 1 huge token. > > * another programmer will write a simplistic Tokenizer that converts each > single char of the input into a 1-char token. It will end up in a huge > number of 1-char tokens. > > Moreoever, both claim the job is done in a brilliant way, because the > Tokenizer is based on a 1-line statement in Java... > > Who did the work better? > > Said that, I'd love to hear more specific requirements about Tokenizer to > avoid the above odd deliveries :) The answer is again "it depends" if you need two tokenizers one creating tokens by dividing at non-lettser and another one dividing at whitespaces a Tokenizer that output every single char is a good super class for those two. See LetterTokenizer / WhitespaceTokenizer and their common superclass CharTokenizer.
Asking the question who did a better job is not valid without specifying the requirements. Anyway, does WhitespaceTokenizer solve your problem?! As Robert said - have a look at the smartcn stuff this is the other extreme - it always depends. simon > > regards > Valery > > -- > View this message in context: > http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25078755.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org