On Mon, Feb 25, 2013 at 12:19 PM, Thomas Matthijs <li...@selckin.be> wrote:
> On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs <li...@selckin.be>wrote: > >> >> On Mon, Feb 25, 2013 at 11:24 AM, Paul Taylor <paul_t...@fastmail.fm>wrote: >> >>> On 20/02/2013 11:28, Paul Taylor wrote: >>> >>>> Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests >>>> that use NormalizeCharMap for replacing characters in the anyalzers are not >>>> working. >>>> >>>> bump, anybody I thought a self contained testcase would be enough to >>> pique somebodys interest, am I doing something silly - maybe but I can't >>> see it >> >> >> >> Tried to run your test but it uses MusicbrainzTokenizer >> > > > Well i made it work, if it's a bug that this is required or if it > documented anywhere i don't know, it does seem very trappy: > It is documented all the way at the bottom: http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/package-summary.html So it should be: class SimpleAnalyzer extends Analyzer { protected NormalizeCharMap charConvertMap; public SimpleAnalyzer() { NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add("&", "and"); charConvertMap = builder.build(); } @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer source = new WhitespaceTokenizer(Version.LUCENE_40, reader); TokenStream filter = new LowerCaseFilter(Version.LUCENE_40, source); return new TokenStreamComponents(source, filter); } @Override protected Reader initReader(String fieldName, Reader reader) { return new MappingCharFilter(charConvertMap, reader); } }