Hi,
Can I remove the filler token _ from the n-gram-tokens that are generated by
a ShingleFilter?
I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter,
and ShingleFilter to create phrase n-grams. The ShingleFilter inserts
FILLER_TOKENs in place of the stopwords, but I don't w
me text and stopword, bigrams only, you'd get ("one two", "two four",
>> "four five").
>> >
>> > Which one did you have in mind? #2 can be achieved by adding
>> PositionFilter after StopFilter and before ShingleFilter. I think #1
>
I meant I'm trying for #2 so this should work (got my numbers mixed
up). Thanks again
Bill
On 5/11/11, William Koscho wrote:
> #1 is what I'm trying for, so Ill give setPositionIncrements(false) a
> try. Thanks for everyone's help.
>
> Bill
>
> On 5/11