List, I have written my own CustomAnalyzer, as follows:
public TokenStream tokenStream(String fieldName, Reader reader) { // TODO: add calls to RemovePuncation, and SplitIdentifiers here // First, convert to lower case TokenStream out = new LowerCaseTokenizer(reader); if (this.doStopping){ out = new StopFilter(true, out, customStopSet); } if (this.doStemming){ out = new PorterStemFilter(out); } return out; } What I need to do is write two custom filters that do the following: - RemovePuncation() removes all characters except [a-zA-Z], preserving case. E.g., "foo=bar*45;" ==> "foo bar 45" "fooBar" ==> "fooBar" "\"stho...@cs.queensu.ca\"" ==> "sthomas cs queensu ca" - SplitIdentifers() breaks up words based on camelCase notation: "fooBar" ==> "foo Bar" "ABCCompany" ==> "ABC Company" (I have the regex for this.) Note this step must be performed before LowerCaseTokenizer, because we need case information to do the splitting. How can I write custom filters, and how do I call them before LowerCaseTokenizer()? Thanks in advance, Steve --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org