to be detailed, I implemented a ftp search engine for campus students. I
have handle many different words including chinese words, as a result I
can't only use whitespaceanalyzer. My analyzer is now like this:

    StandardTokenizer tokenStream = new StandardTokenizer(reader,
replaceInvalidAcronym);
    tokenStream.setMaxTokenLength(maxTokenLength);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    result = new SnowballFilter(result,STEMMER);

StandardTokenizer is modified by me to split words like season09(like search
for friends season 09) to “season" and "09"。
word "c++" is analyzed as "c".

I know i can modify the standardtokenizer to achieve my goal. But are there
any other neat methods?

2009/4/9 hyj <hongyin...@163.com>

> 王巍巍,您好!
>
>        WhitespaceAnalyzer can work.
>
> ======= 2009-04-09 15:15:14 您在来信中写道:=======
>
> >I want to make my lucene can search word like c++, c#,  how can i modify
> my
> >analyzer to achieve this goal?
> >
> >--
> >王巍巍(Weiwei Wang)
> >Department of Computer Science
> >Gulou Campus of Nanjing University
> >Nanjing, P.R.China, 210093
> >
> >Mobile: 86-13913310569
> >MSN: ww.wang...@gmail.com
> >Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>
> = = = = = = = = = = = = = = = = = = = =
>
>
> 致
> 礼!
>
>
> hyj
> hongyin...@163.com
> 2009-04-09
>
>


-- 
王巍巍(Weiwei Wang)
Department of Computer Science
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Mobile: 86-13913310569
MSN: ww.wang...@gmail.com
Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Reply via email to