> we already did this in 3.1 by making a base FilteringTokenFilter class?
> a regex filter is trivial if you subclass this (we could add something like
> this
> untested code to the .pattern package or whatever)
>
> public class PatternRemoveFilter extends FilteringTokenFilter {
> private final
Sure, I'd be will to do that. I'll get create an issue and then get working
on code and tests.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-I-omit-ShingleFilter-s-filler-tokens-tp2926009p2933250.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Thursday, May 12, 2011 1:15 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
>
> On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote:
> > A thought: one way to do #
On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote:
> A thought: one way to do #1 without modifying ShingleFilter: if there were a
> StopFilter variant that accepted regular expressions instead of a stopword
> list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a
> full m
> -Original Message-
> From: Elmo Bleek [mailto:barb...@gmail.com]
> Sent: Thursday, May 12, 2011 12:51 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
>
> I have found that simply having StopFilter before ShingleFilter does the
I have found that simply having StopFilter before ShingleFilter does the
trick for #2. However, I have also been working on trying to accomplish #1,
don't create shingles across stop words. I am currently under the impression
that this will take modifying ShingleFilter. Does anyone have any
suggest
>> #1 (don't create shingles across stopwords).
>>
>>> -Original Message-
>>> From: Robert Muir [mailto:rcm...@gmail.com]
>>> Sent: Wednesday, May 11, 2011 9:02 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Can I omit
gt;> To: java-user@lucene.apache.org
>> Subject: Re: Can I omit ShingleFilter's filler tokens
>>
>> another idea is to .setEnablePositionIncrements(false) on your
>> stopfilter.
>>
>> On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe wrote:
>> > Hi
Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Wednesday, May 11, 2011 9:02 AM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
>
> another idea is to .setEnablePositionIncrements(false) on your
> stopfilter.
another idea is to .setEnablePositionIncrements(false) on your stopfilter.
On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe wrote:
> Hi Bill,
>
> I can think of two possible interpretations of "removing filler tokens":
>
> 1. Don't create shingles across stopwords, e.g. for text "one two three four
Hi Bill,
I can think of two possible interpretations of "removing filler tokens":
1. Don't create shingles across stopwords, e.g. for text "one two three four
five" and stopword "three", bigrams only, you'd get ("one two", "four five"),
instead of the current ("one two", "two _", "_ four", "fou
11 matches
Mail list logo