to be detailed, I implemented a ftp search engine for campus students. I have handle many different words including chinese words, as a result I can't only use whitespaceanalyzer. My analyzer is now like this:
StandardTokenizer tokenStream = new StandardTokenizer(reader, replaceInvalidAcronym); tokenStream.setMaxTokenLength(maxTokenLength); TokenStream result = new StandardFilter(tokenStream); result = new LowerCaseFilter(result); result = new StopFilter(result, stopSet); result = new SnowballFilter(result,STEMMER); StandardTokenizer is modified by me to split words like season09(like search for friends season 09) to “season" and "09"。 word "c++" is analyzed as "c". I know i can modify the standardtokenizer to achieve my goal. But are there any other neat methods? 2009/4/9 hyj <hongyin...@163.com> > 王巍巍,您好! > > WhitespaceAnalyzer can work. > > ======= 2009-04-09 15:15:14 您在来信中写道:======= > > >I want to make my lucene can search word like c++, c#, how can i modify > my > >analyzer to achieve this goal? > > > >-- > >王巍巍(Weiwei Wang) > >Department of Computer Science > >Gulou Campus of Nanjing University > >Nanjing, P.R.China, 210093 > > > >Mobile: 86-13913310569 > >MSN: ww.wang...@gmail.com > >Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > = = = = = = = = = = = = = = = = = = = = > > > 致 > 礼! > > > hyj > hongyin...@163.com > 2009-04-09 > > -- 王巍巍(Weiwei Wang) Department of Computer Science Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Mobile: 86-13913310569 MSN: ww.wang...@gmail.com Homepage: http://cs.nju.edu.cn/rl/weiweiwang