WordDelimiterFilter doesn't explicitly use an Tokenizer -- thats the bueaty of TokenFilters, you can compose them arround any other TokenStream instance that you want.
If you have a custom grammer file of your own that you like, you can use it to build your own Tokenizer and then wrap that up in a WordDelimiterFilter (and any other filters you want) to make a custom Analyzer ... this is all StandardAnalyzer does, it wraps the StandardTokenizer (which is built from a .jj file) with a few useful TokenFilters. : Date: Fri, 17 Nov 2006 11:04:43 +0100 : From: Martin Braun <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED] : To: java-user@lucene.apache.org : Subject: Search "C++" with Solrs WordDelimiterFilter : : hi all, : : I would like to implement the possibility to search for "C++" and "C#" - : I found in the archive the hint to customize the appropriate *.jj file : with the code in NutchAnalysis.jj: : : // irregular words : | <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)> : | <#C_PLUS_PLUS: ("C"|"c") "++" > : | <#C_SHARP: ("C"|"c") "#" > : : I am using a custum analyzer with the yonik's WordDelimiterFilter: : : @Override : public TokenStream tokenStream(String fieldName, Reader reader) { : : return new LowerCaseFilter(new WordDelimiterFilter(new : WhitespaceTokenizer(reader),1,1,1,1,1 )); : } : : : But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer : which does not use a Java-CC file. : : What would be the best way to integrate (anyway, preferably not changing : lucene-src) this feature? : : Should I override the WhitespaceTokenizer and using java-cc ( are there : any docs on doing this?). : : tia, : martin : : : : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]