Oddly I had the exact same thought. Although it's not obvious from the name (and common usage) of trim-like functions that you'd also have a way to specify maximum length (after trimming I'd assume).
And the other thought I had was that TrimFilter should optionally take a list of characters to trim. Then I thought of regex, especially to specify character classes like \w..... naaahhhhhh, we just went there...... but I think I'd prefer a separate filter. If for no other reason that by including a length in the trim filter, you implicitly disallow having spaces in the beginning or end of your tokens. Why you'd want this I don't have a use-case for, but there's no good reason I can think of to couple these two different functions.... FWIW, Erick On Wed, Nov 14, 2012 at 2:05 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > Hi Geoff, > cool, that will eliminate possible regex pitfalls in schema.xml > > I was thinking about enhancing an existing filter as multi-purpose filter. > E.g. TrimFilter, if maxLength is set then also limit the termAtt to > maxLength. > This will keep the number of available filters small, especially for > simple tasks. > Any thoughts from the core developers about this idea? > > Regards > Bernd > > > Am 13.11.2012 17:56, schrieb Geoff Cooney: > > Hi, > > > > I've been following this thread and happen to have a simple > > TruncatingFilter class I wrote for the same purpose. I think this should > > do what you want: > > > > > > > > import java.io.IOException; > > > > import org.apache.lucene.analysis.TokenFilter; > > import org.apache.lucene.analysis.TokenStream; > > import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; > > > > public class TruncatingFilter extends TokenFilter { > > private final CharTermAttribute termAtt = > > addAttribute(CharTermAttribute.class); > > private final int maxLength; > > > > protected TruncatingFilter(TokenStream input, int maxLength) { > > super(input); > > this.maxLength = maxLength; > > } > > > > @Override > > public boolean incrementToken() throws IOException { > > if (input.incrementToken()) { > > if (termAtt.length() > maxLength) { > > termAtt.setLength(maxLength); > > } > > > > return true; > > } else { > > return false; > > } > > } > > > > } > > > > Cheers, > > Geoff > > > > > > On Tue, Nov 13, 2012 at 7:54 AM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > >> There's nothing in Solr that I know of that does this. It would be a > pretty > >> easy custom filter to create though.... > >> > >> FWIW, > >> Erick > >> > >> > >> On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir <rcm...@gmail.com> wrote: > >> > >>> On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling > >>> <bernd.fehl...@uni-bielefeld.de> wrote: > >>>> By the way, why does TrimFilter option updateOffset defaults to false, > >>>> just keep it backwards compatible? > >>>> > >>> > >>> In my opinion this option should be removed. > >>> > >>> TokenFilters shouldn't muck with offsets, for a lot of reasons, but > >>> especially because its too late to interact with any charfilter. > >>> > >>> This is the tokenizer's job. > >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >