Re: Search "C++" with Solrs WordDelimiterFilter

Chris Hostetter Fri, 17 Nov 2006 10:48:09 -0800

WordDelimiterFilter doesn't explicitly use an Tokenizer -- thats the
bueaty of TokenFilters, you can compose them arround any other TokenStream
instance that you want.


If you have a custom grammer file of your own that you like, you can use
it to build your own Tokenizer and then wrap that up in a
WordDelimiterFilter (and any other filters you want) to make a custom
Analyzer ... this is all StandardAnalyzer does, it wraps the
StandardTokenizer (which is built from a .jj file) with a few useful
TokenFilters.


: Date: Fri, 17 Nov 2006 11:04:43 +0100
: From: Martin Braun <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED]
: To: java-user@lucene.apache.org
: Subject: Search "C++" with Solrs WordDelimiterFilter
:
: hi all,
:
: I would like to implement the possibility to search for "C++" and "C#" -
: I found in the archive the hint to customize the appropriate *.jj  file
: with the code in NutchAnalysis.jj:
:
:      // irregular words
: | <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)>
: | <#C_PLUS_PLUS: ("C"|"c") "++" >
: | <#C_SHARP: ("C"|"c") "#" >
:
: I am using a custum analyzer with the yonik's WordDelimiterFilter:
:
: @Override
:       public TokenStream tokenStream(String fieldName, Reader reader) {
:
:               return new LowerCaseFilter(new WordDelimiterFilter(new
: WhitespaceTokenizer(reader),1,1,1,1,1 ));
:       }
:
:
: But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer
: which does not use a Java-CC file.
:
: What would be the best way to integrate (anyway, preferably not changing
: lucene-src) this feature?
:
: Should I override the WhitespaceTokenizer and using java-cc ( are there
: any docs on doing this?).
:
: tia,
: martin
:
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search "C++" with Solrs WordDelimiterFilter

Reply via email to