On Sat, Nov 3, 2012 at 7:35 PM, Igal @ getRailo.org <i...@getrailo.org> wrote: > hi, > > I want to make sure that every comma (,) and semi-colon (;) is followed by a > space prior to tokenizing. > > the idea is to then use a WhitespaceTokenizer which will keep commas but > still split the phrase in a case like: > > "I bought red apples,green pears,and yellow oranges" > > I'm thinking of extending CharFilter to "inject" a space after the comma. > my questions are: > > 1) does it make sense or am I completely off here? > > 2) are there any code examples of CharFilter implementations with > injection of a char?
Can't you just use something like MappingCharFilter with a single mapping of "," to ", " ? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org