I meant I'm trying for #2 so this should work (got my numbers mixed up). Thanks again
Bill On 5/11/11, William Koscho <wkos...@gmail.com> wrote: > #1 is what I'm trying for, so Ill give setPositionIncrements(false) a > try. Thanks for everyone's help. > > Bill > > On 5/11/11, Steven A Rowe <sar...@syr.edu> wrote: >> Yes, StopFilter.setEnablePositionIncrements(false) will almost certainly >> get >> higher throughput than inserting PositionFilter. Like PositionFilter, >> this >> will buy you #2 (create shingles as if stopwords were never there), but >> not >> #1 (don't create shingles across stopwords). >> >>> -----Original Message----- >>> From: Robert Muir [mailto:rcm...@gmail.com] >>> Sent: Wednesday, May 11, 2011 9:02 AM >>> To: java-user@lucene.apache.org >>> Subject: Re: Can I omit ShingleFilter's filler tokens >>> >>> another idea is to .setEnablePositionIncrements(false) on your >>> stopfilter. >>> >>> On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe <sar...@syr.edu> wrote: >>> > Hi Bill, >>> > >>> > I can think of two possible interpretations of "removing filler >>> tokens": >>> > >>> > 1. Don't create shingles across stopwords, e.g. for text "one two >>> > three >>> four five" and stopword "three", bigrams only, you'd get ("one two", >>> "four five"), instead of the current ("one two", "two _", "_ four", >>> "four >>> five"). >>> > >>> > 2. Create shingles as if the stopwords were never there, e.g. for the >>> same text and stopword, bigrams only, you'd get ("one two", "two four", >>> "four five"). >>> > >>> > Which one did you have in mind? #2 can be achieved by adding >>> PositionFilter after StopFilter and before ShingleFilter. I think #1 >>> requires ShingleFilter modifications. >>> > >>> > Steve >>> > >>> >> -----Original Message----- >>> >> From: William Koscho [mailto:wkos...@gmail.com] >>> >> Sent: Wednesday, May 11, 2011 12:05 AM >>> >> To: java-user@lucene.apache.org >>> >> Subject: Can I omit ShingleFilter's filler tokens >>> >> >>> >> Hi, >>> >> >>> >> Can I remove the filler token _ from the n-gram-tokens that are >>> generated >>> >> by >>> >> a ShingleFilter? >>> >> >>> >> I'm using a chain of filters: ClassicFilter, StopFilter, >>> LowerCaseFilter, >>> >> and ShingleFilter to create phrase n-grams. The ShingleFilter >>> >> inserts >>> >> FILLER_TOKENs in place of the stopwords, but I don't want them. >>> >> >>> >> How can I omit the filler tokens? >>> >> >>> >> thanks >>> >> Bill >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > -- > Sent from my mobile device > -- Sent from my mobile device --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org