A thought: one way to do #1 without modifying ShingleFilter: if there were a 
StopFilter variant that accepted regular expressions instead of a stopword 
list, you could configure it with a regex like /_ .*|.* _| _ / (assuming a full 
match is required, i.e. implicit beginning and end anchors), and place it in 
the analysis pipeline after ShingleFilter to throw out shingles with filler 
tokens in them.

(It think it would be useful to generalize StopFilter to allow for more sources 
of stoppage, rather than just creating a StopRegexFilter with no relation to 
StopFilter.)

Steve

> -----Original Message-----
> From: Elmo Bleek [mailto:barb...@gmail.com]
> Sent: Thursday, May 12, 2011 12:51 PM
> To: java-user@lucene.apache.org
> Subject: Re: Can I omit ShingleFilter's filler tokens
> 
> I have found that simply having StopFilter before ShingleFilter does the
> trick for #2. However, I have also been working on trying to accomplish
> #1,
> don't create shingles across stop words. I am currently under the
> impression
> that this will take modifying ShingleFilter. Does anyone have any
> suggestions?
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Can-I-
> omit-ShingleFilter-s-filler-tokens-tp2926009p2932604.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to