Hi, all, Suppose I want to index this string NashQ/c++.test and i used the following procedure to do the processing. NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); RECOVERY_MAP.add("c++","cplusplus$"); CharFilter filter = new LowercaseCharFilter(reader);//LowercaseCharFilter, see the source code at the bottome filter = new MappingCharFilter(RECOVERY_MAP,filter); StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_30, filter); tokenStream.setMaxTokenLength(maxTokenLength); TokenStream result = new StandardFilter(tokenStream); //result = new LowerCaseFilter(result); result = new StopFilter(enableStopPositionIncrements, result, stopSet); result = new SnowballFilter(result, STEMMER);
When i search this index with keyword nashq, nothing is returned, but if I uncomment resut=new LowerCaseFilter(result); it will works fine? Does anybody here know what's going on? It seems my LowercaseCharFilter has already done this job. //code for LowercaseCharFilter package analysis; import java.io.IOException; import java.io.Reader; import org.apache.lucene.analysis.BaseCharFilter; import org.apache.lucene.analysis.CharReader; import org.apache.lucene.analysis.CharStream; public class LowercaseCharFilter extends BaseCharFilter { public LowercaseCharFilter(CharStream in) { super(in); } public LowercaseCharFilter(Reader in) { super(CharReader.get(in)); } @Override public int read() throws IOException { return Character.toLowerCase(input.read()); } @Override public int read(char[] cbuf, int off, int len) throws IOException { int ret = input.read(cbuf, off, len); if(ret!=-1) { for(int i=off; i<off+ret; i++) cbuf[i] = Character.toLowerCase(cbuf[i]); } return ret; } } -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang