Ahh, I don't know of a better way. I can imagine complex solutions involving something akin to WordDelimiterFilter... and I can imagine that that would be ridiculously expensive to maintain when there are really simple solutions like you're looking at.
Mostly I was curious about your use-case.... Erick On Sat, Nov 3, 2012 at 11:35 PM, Igal @ getRailo.org <i...@getrailo.org>wrote: > well, my main goal is to use a ShingleFilter that will only take shingles > that are not separated by commas etc. > > for example, the phrase: > > "red apples, green tomatoes, and brown potatoes" > > should yield the shingles "red apples", "green tomatoes", "and brown", > "brown potatoes"; but not "apples green" and not "tomatoes and" as those > are separated by commas. > > the problem with the common tokenizers is that they get rid of the commas > so if I use a ShingleFilter after them there's no way to tell if there was > a comma there or not. > > (another option I consider is to add an Attribute to specify if there was > a comma before or after a token) > > if there's a better way -- I'm open to suggestions, > > > Igal > > > > On 11/3/2012 8:10 PM, Erick Erickson wrote: > >> So I've gotta ask... _why_ do you want to inject the spaces? >> If it's just to break this up into tokens, wouldn't something like >> LetterTokenizer do? Assuming you aren't interested in >> leaving in numbers.... Or even StandardTokenizer unless you have >> e-mail & etc. >> >> Or what about PatternReplaceCharFilter? >> >> FWIW, >> Erick >> >> >> >> On Sat, Nov 3, 2012 at 9:22 PM, Igal Sapir <i...@getrailo.org> wrote: >> >> You're right. I'm not sure what I was thinking. >>> >>> Thanks for all your help, >>> >>> Igal >>> On Nov 3, 2012 5:44 PM, "Robert Muir" <rcm...@gmail.com> wrote: >>> >>> On Sat, Nov 3, 2012 at 8:32 PM, Igal @ getRailo.org <i...@getrailo.org> >>>> wrote: >>>> >>>>> hi Robert, >>>>> >>>>> thank you for your replies. >>>>> >>>>> I couldn't find much documentation/examples of this, but this is what I >>>>> >>>> came >>>> >>>>> up with (below). is that the way I'm supposed to use the >>>>> >>>> MappingCharFilter? >>>> You don't need to extend anything. >>>> You also don't want to create a normalizecharmap for each reader >>>> (thats way too heavy) >>>> >>>> Just build the NormalizeCharMap once, and pass it to >>>> MappingCharFilter's Constructor. >>>> >>>> ------------------------------**------------------------------** >>>> --------- >>>> To unsubscribe, e-mail: >>>> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> >>>> For additional commands, e-mail: >>>> java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> >>>> >>>> >>>> > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> > For additional commands, e-mail: > java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> > >