> we already did this in 3.1 by making a base FilteringTokenFilter class?
> a regex filter is trivial if you subclass this (we could add something like
> this
> untested code to the .pattern package or whatever)
>
> public class PatternRemoveFilter extends FilteringTokenFilter {
> private final
Sure, I'd be will to do that. I'll get create an issue and then get working
on code and tests.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-I-omit-ShingleFilter-s-filler-tokens-tp2926009p2933250.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Thursday, May 12, 2011 1:15 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
>
> On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote:
> > A thought: one way to do #
On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe wrote:
> A thought: one way to do #1 without modifying ShingleFilter: if there were a
> StopFilter variant that accepted regular expressions instead of a stopword
> list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a
> full m
> -Original Message-
> From: Elmo Bleek [mailto:barb...@gmail.com]
> Sent: Thursday, May 12, 2011 12:51 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
>
> I have found that simply having StopFilter before ShingleFilter does the
I have found that simply having StopFilter before ShingleFilter does the
trick for #2. However, I have also been working on trying to accomplish #1,
don't create shingles across stop words. I am currently under the impression
that this will take modifying ShingleFilter. Does anyone have any
suggest
>> #1 (don't create shingles across stopwords).
>>
>>> -Original Message-
>>> From: Robert Muir [mailto:rcm...@gmail.com]
>>> Sent: Wednesday, May 11, 2011 9:02 AM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Can I omit
gt;> To: java-user@lucene.apache.org
>> Subject: Re: Can I omit ShingleFilter's filler tokens
>>
>> another idea is to .setEnablePositionIncrements(false) on your
>> stopfilter.
>>
>> On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe wrote:
>> > Hi
Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Wednesday, May 11, 2011 9:02 AM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
>
> another idea is to .setEnablePositionIncrements(false) on your
> stopfilter.
d before ShingleFilter. I think #1 requires ShingleFilter
> modifications.
>
> Steve
>
>> -Original Message-
>> From: William Koscho [mailto:wkos...@gmail.com]
>> Sent: Wednesday, May 11, 2011 12:05 AM
>> To: java-user@lucene.apache.org
>> Subject:
t;).
Which one did you have in mind? #2 can be achieved by adding PositionFilter
after StopFilter and before ShingleFilter. I think #1 requires ShingleFilter
modifications.
Steve
> -Original Message-
> From: William Koscho [mailto:wkos...@gmail.com]
> Sent: Wednesday, May 11,
Hi,
Can I remove the filler token _ from the n-gram-tokens that are generated by
a ShingleFilter?
I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter,
and ShingleFilter to create phrase n-grams. The ShingleFilter inserts
FILLER_TOKENs in place of the stopwords, but I don't w
12 matches
Mail list logo