> Can someone please point me in the right direction.
> 
> We are creating an application that needs to beable to
> search on C++ and get
> back doc's that have C++ in it.  The StandardAnalyzer
> does not seem to index
> the "+", so a search for "C++" will bring back docs that
> contain, C++, C,
> C#, etc.....  The WhiteSpaceAnalyzer will index the
> "+", but if we have the
> term "C++." that is, if C++ is at the end of a sentence, it
> will index
> "C++." so a search for "C++" will not return the doc. 
> I have heard of maybe
> a CustomAnalyzer; however, it seems like there would
> actually need to be a
> CustomFilter/CustomTokenizer, I looked at:
>      - StandardAnalyzer.java
>      - StandardFilter.java
>      - StandardTokenizer.java
>      - StandardTokenizerImpl.java
>      - StandardTokenizerImpl.jflex
> 
> I would guess that the StandardTokenizer is where the
> changes would need to
> be made to allow the "+" character, but I am unclear as to
> how.
> 
> Any and all help is greatly appreciated.

One option is to modify StandardTokenizerImpl.jflex and generate 
CustomTokenizerImpl.java so that it will recognize C++ and C# as one token. You 
need to write a new Tokenizer that uses that CustomTokenizerImpl.java.

Other option can be to extend CharTokenizer. Modify the source code of 
LetterTokenizer : 

 @Override
  protected boolean isTokenChar(char c) {
    return Character.isLetter(c) || c=='+' || c=='#';
  }

Hope this helps.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to