Hi,

This came up before, a while ago: http://www.nabble.com/searching-for-C%2B%2B-to18093942.html#a18093942

I don't think there is an easier way than modifying the standard analyzer. As I suggested in that earlier thread, I would make the analyzer recognize token patterns that consist of words with prefixed or postfixed symbols[1] Then you will receive tokens like "c++" or "~/.file" in your token filter. You can then choose to pass them as single tokens, or split them down further into two or more tokens.

-John

[1] If you decide to try matching words with symbols in the middle, be aware that the StandardAnalyzer already handles some examples of this, such as e-mail addresses, so you may make something redundant.

??? wrote:
to be detailed, I implemented a ftp search engine for campus students. I
have handle many different words including chinese words, as a result I
can't only use whitespaceanalyzer. My analyzer is now like this:

    StandardTokenizer tokenStream = new StandardTokenizer(reader,
replaceInvalidAcronym);
    tokenStream.setMaxTokenLength(maxTokenLength);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    result = new SnowballFilter(result,STEMMER);

StandardTokenizer is modified by me to split words like season09(like search
for friends season 09) to “season" and "09"?
word "c++" is analyzed as "c".

I know i can modify the standardtokenizer to achieve my goal. But are there
any other neat methods?

2009/4/9 hyj <hongyin...@163.com>

???,??!

       WhitespaceAnalyzer can work.

======= 2009-04-09 15:15:14 ???????:=======

I want to make my lucene can search word like c++, c#,  how can i modify
my
analyzer to achieve this goal?

--
???(Weiwei Wang)
Department of Computer Science
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Mobile: 86-13913310569
MSN: ww.wang...@gmail.com
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
= = = = = = = = = = = = = = = = = = = =


?
?!


hyj
hongyin...@163.com
2009-04-09




------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 8.0.238 / Virus Database: 270.11.48/2048 - Release Date: 04/08/09 19:02:00



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to