hi,

I want to make sure that every comma (,) and semi-colon (;) is followed by a space prior to tokenizing.

the idea is to then use a WhitespaceTokenizer which will keep commas but still split the phrase in a case like:

    "I bought red apples,green pears,and yellow oranges"

I'm thinking of extending CharFilter to "inject" a space after the comma. my questions are:

    1) does it make sense or am I completely off here?

2) are there any code examples of CharFilter implementations with injection of a char?

TIA

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to