Cool! I had forgotten about FilteringTokenFilter. Elmo, would you care to make a JIRA issue and a patch (based on Robert's code, and adding some tests) to create this? If so, this may be useful:
http://wiki.apache.org/lucene-java/HowToContribute Steve > -----Original Message----- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Thursday, May 12, 2011 1:15 PM > To: java-user@lucene.apache.org > Subject: Re: Can I omit ShingleFilter's filler tokens > > On Thu, May 12, 2011 at 1:03 PM, Steven A Rowe <sar...@syr.edu> wrote: > > A thought: one way to do #1 without modifying ShingleFilter: if there > were a StopFilter variant that accepted regular expressions instead of a > stopword list, you could configure it with a regex like /_ .*|.* _| _ / > (assuming a full match is required, i.e. implicit beginning and end > anchors), and place it in the analysis pipeline after ShingleFilter to > throw out shingles with filler tokens in them. > > > > (It think it would be useful to generalize StopFilter to allow for more > sources of stoppage, rather than just creating a StopRegexFilter with no > relation to StopFilter.) > > > > we already did this in 3.1 by making a base FilteringTokenFilter class? > a regex filter is trivial if you subclass this (we could add something > like this untested code to the .pattern package or whatever) > > public class PatternRemoveFilter extends FilteringTokenFilter { > private final Matcher matcher; > private final CharTermAttribute termAtt = > addAttribute(CharTermAttribute.class); > > public PatternRemoveFilter(boolean enablePositionIncrements, > TokenStream input, Pattern pattern) { > super(enablePositionIncrements, input); > matcher = pattern.matcher(termAtt); > } > > @Override > protected boolean accept() throws IOException { > matcher.reset(); > return !matcher.matches(); > } > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org