I'd like to have all my queries and terms run through Unicode Normalization prior to being executed/indexed. I've been using the StandardAnalyzer with pretty good luck for the past few years, so I think I'd like to write an analyzer that wraps that, and tacks a custom TokenFilter onto the chain provided by the StandardAnalyzer. I'm really not clear, though, on how to write a TokenFilter. My best guess is that I want to write a class that overrides getAttribute, and uses java.text.Normalizer to normalize any TermAttribute that is returned from the upstream filter. Is that correct, or should I put my normalization somewhere else? Are there any docs on making custom filters/analyzers? I didn't have much luck finding any.
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org