I am using a custom analyzer:
public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer tokenStream = new StandardTokenizer(reader); tokenStream.setMaxTokenLength(maxTokenLength); TokenStream result = new ASCIIFoldingFilter(tokenStream); result = new StandardFilter(result); result = new LengthFilter(result, 3, maxTokenLength); result = new LowerCaseFilter(result); result = new StopFilter(true, result, stopSet); result = new PorterStemFilter(result); return result; } My question is around creating a new tokenizer which can detect people name/place names etc(I will be able to lookup on my local db to find such cases). E.g: If a text has "Joe Coder is in New York", then instead of termvectors [Joe][Coder][New][York], I would like to have term vectors as [Joe Coder][New York] Are there any tokenzier in lucene that I can extend to perform this functionality? Any other pointers? -- View this message in context: http://www.nabble.com/Lucene-Tokenizer-%2B-Merge-terms-tp25002240p25002240.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org