On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs <li...@selckin.be> wrote:
> > On Mon, Feb 25, 2013 at 11:24 AM, Paul Taylor <paul_t...@fastmail.fm>wrote: > >> On 20/02/2013 11:28, Paul Taylor wrote: >> >>> Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests >>> that use NormalizeCharMap for replacing characters in the anyalzers are not >>> working. >>> >>> bump, anybody I thought a self contained testcase would be enough to >> pique somebodys interest, am I doing something silly - maybe but I can't >> see it > > > > Tried to run your test but it uses MusicbrainzTokenizer > Well i made it work, if it's a bug that this is required or if it documented anywhere i don't know, it does seem very trappy: class SimpleAnalyzer extends Analyzer { protected NormalizeCharMap charConvertMap; public SimpleAnalyzer() { NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add("&", "and"); charConvertMap = builder.build(); } @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer source = new WhitespaceTokenizer(Version.LUCENE_40, new MappingCharFilter(charConvertMap, reader)); TokenStream filter = new LowerCaseFilter(Version.LUCENE_40, source); return new TokenStreamComponents(source, filter) { @Override protected void setReader(Reader reader) throws IOException { super.setReader(new MappingCharFilter(charConvertMap, reader)); } }; } }