RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Uwe Schindler
> we already did this in 3.1 by making a base FilteringTokenFilter class? > a regex filter is trivial if you subclass this (we could add something like > this > untested code to the .pattern package or whatever) > > public class PatternRemoveFilter extends FilteringTokenFilter { > private final

RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Elmo Bleek
Sure, I'd be will to do that. I'll get create an issue and then get working on code and tests. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-omit-ShingleFilter-s-filler-tokens-tp2926009p2933250.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Steven A Rowe
Message- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Thursday, May 12, 2011 1:15 PM > To: java-user@lucene.apache.org > Subject: Re: Can I omit ShingleFilter's filler tokens > > On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote: > > A thought: one way to do #

Re: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Robert Muir
On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote: > A thought: one way to do #1 without modifying ShingleFilter: if there were a > StopFilter variant that accepted regular expressions instead of a stopword > list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a > full m

RE: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Steven A Rowe
> -Original Message- > From: Elmo Bleek [mailto:barb...@gmail.com] > Sent: Thursday, May 12, 2011 12:51 PM > To: java-user@lucene.apache.org > Subject: Re: Can I omit ShingleFilter's filler tokens > > I have found that simply having StopFilter before ShingleFilter does the

Re: Can I omit ShingleFilter's filler tokens

2011-05-12 Thread Elmo Bleek
I have found that simply having StopFilter before ShingleFilter does the trick for #2. However, I have also been working on trying to accomplish #1, don't create shingles across stop words. I am currently under the impression that this will take modifying ShingleFilter. Does anyone have any suggest

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread William Koscho
>> #1 (don't create shingles across stopwords). >> >>> -Original Message- >>> From: Robert Muir [mailto:rcm...@gmail.com] >>> Sent: Wednesday, May 11, 2011 9:02 AM >>> To: java-user@lucene.apache.org >>> Subject: Re: Can I omit

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread William Koscho
gt;> To: java-user@lucene.apache.org >> Subject: Re: Can I omit ShingleFilter's filler tokens >> >> another idea is to .setEnablePositionIncrements(false) on your >> stopfilter. >> >> On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe wrote: >> > Hi

RE: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread Steven A Rowe
Original Message- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Wednesday, May 11, 2011 9:02 AM > To: java-user@lucene.apache.org > Subject: Re: Can I omit ShingleFilter's filler tokens > > another idea is to .setEnablePositionIncrements(false) on your > stopfilter.

Re: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread Robert Muir
another idea is to .setEnablePositionIncrements(false) on your stopfilter. On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe wrote: > Hi Bill, > > I can think of two possible interpretations of "removing filler tokens": > > 1. Don't create shingles across stopwords, e.g. for text "one two three four

RE: Can I omit ShingleFilter's filler tokens

2011-05-11 Thread Steven A Rowe
Hi Bill, I can think of two possible interpretations of "removing filler tokens": 1. Don't create shingles across stopwords, e.g. for text "one two three four five" and stopword "three", bigrams only, you'd get ("one two", "four five"), instead of the current ("one two", "two _", "_ four", "fou