I'd like to have all my queries and terms run through Unicode
Normalization prior to being executed/indexed.  I've been using the
StandardAnalyzer with pretty good luck for the past few years, so I
think I'd like to write an analyzer that wraps that, and tacks a
custom TokenFilter onto the chain provided by the StandardAnalyzer.
I'm really not clear, though, on how to write a TokenFilter.  My best
guess is that I want to write a class that overrides getAttribute, and
uses java.text.Normalizer to normalize any TermAttribute that is
returned from the upstream filter.  Is that correct, or should I put
my normalization somewhere else?  Are there any docs on making custom
filters/analyzers?  I didn't have much luck finding any.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to