Hi,
This came up before, a while ago:
http://www.nabble.com/searching-for-C%2B%2B-to18093942.html#a18093942
I don't think there is an easier way than modifying the standard
analyzer. As I suggested in that earlier thread, I would make the
analyzer recognize token patterns that consist of words with prefixed or
postfixed symbols[1] Then you will receive tokens like "c++" or
"~/.file" in your token filter. You can then choose to pass them as
single tokens, or split them down further into two or more tokens.
-John
[1] If you decide to try matching words with symbols in the middle, be
aware that the StandardAnalyzer already handles some examples of this,
such as e-mail addresses, so you may make something redundant.
??? wrote:
to be detailed, I implemented a ftp search engine for campus students. I
have handle many different words including chinese words, as a result I
can't only use whitespaceanalyzer. My analyzer is now like this:
StandardTokenizer tokenStream = new StandardTokenizer(reader,
replaceInvalidAcronym);
tokenStream.setMaxTokenLength(maxTokenLength);
TokenStream result = new StandardFilter(tokenStream);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new SnowballFilter(result,STEMMER);
StandardTokenizer is modified by me to split words like season09(like search
for friends season 09) to season" and "09"?
word "c++" is analyzed as "c".
I know i can modify the standardtokenizer to achieve my goal. But are there
any other neat methods?
2009/4/9 hyj <hongyin...@163.com>
???,??!
WhitespaceAnalyzer can work.
======= 2009-04-09 15:15:14 ???????:=======
I want to make my lucene can search word like c++, c#, how can i modify
my
analyzer to achieve this goal?
--
???(Weiwei Wang)
Department of Computer Science
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Mobile: 86-13913310569
MSN: ww.wang...@gmail.com
Homepage: http://cs.nju.edu.cn/rl/weiweiwang
= = = = = = = = = = = = = = = = = = = =
?
?!
hyj
hongyin...@163.com
2009-04-09
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.0.238 / Virus Database: 270.11.48/2048 - Release Date: 04/08/09 19:02:00
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org