Can I omit ShingleFilter's filler tokens

2011-05-10 Thread William Koscho
Hi, Can I remove the filler token _ from the n-gram-tokens that are generated by a ShingleFilter? I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter, and ShingleFilter to create phrase n-grams. The ShingleFilter inserts FILLER_TOKENs in place of the stopwords, but I don't w

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread William Koscho
me text and stopword, bigrams only, you'd get ("one two", "two four", >> "four five"). >> > >> > Which one did you have in mind?  #2 can be achieved by adding >> PositionFilter after StopFilter and before ShingleFilter.  I think #1 >

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread William Koscho
I meant I'm trying for #2 so this should work (got my numbers mixed up). Thanks again Bill On 5/11/11, William Koscho wrote: > #1 is what I'm trying for, so Ill give setPositionIncrements(false) a > try. Thanks for everyone's help. > > Bill > > On 5/11