hi,
I want to make sure that every comma (,) and semi-colon (;) is followed
by a space prior to tokenizing.
the idea is to then use a WhitespaceTokenizer which will keep commas but
still split the phrase in a case like:
"I bought red apples,green pears,and yellow oranges"
I'm thinking of extending CharFilter to "inject" a space after the
comma. my questions are:
1) does it make sense or am I completely off here?
2) are there any code examples of CharFilter implementations with
injection of a char?
TIA
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org