Hi everyone, I am trying to port forward to 4.2 some Lucene 3.2-era code that uses the ASCIIFoldingFilter. The token stream handling has changed significantly since them, and I cannot figure out what I am doing wrong.
It seems that I should extend AnalyzerWrapper so that I can intercept the TokenStream and filter it with the ASCIIFoldingFilter. I have written the following code: public final class TokenFilterAnalyzerWrapper extends AnalyzerWrapper { private final Analyzer baseAnalyzer; private final TokenFilterFactory tokenFilterFactory; public TokenFilterAnalyzerWrapper(Analyzer baseAnalyzer, TokenFilterFactory tokenFilterFactory) { this.baseAnalyzer = baseAnalyzer; this.tokenFilterFactory = tokenFilterFactory; } @Override public void close() { baseAnalyzer.close(); super.close(); } @Override protected Analyzer getWrappedAnalyzer(String fieldName) { return baseAnalyzer; } @Override protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) { return new TokenStreamComponents(components.getTokenizer(), tokenFilterFactory.create(components.getTokenStream())); } } and the following test case: public class TokenFilterAnalyzerWrapperTest { @Test public void testFilter() throws Exception { char[] expected = {'a', 'e', 'i', 'o', 'u'}; try (Analyzer analyzer = new TokenFilterAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_42), new ASCIIFoldingFilterFactory())) { TokenStream stream = analyzer.tokenStream("test", new StringReader("a é î ø ü")); for (int i = 0; i < 5; i++) { assertTrue(stream.incrementToken()); assertEquals(Character.toString(expected[i]), stream.getAttribute(CharTermAttribute.class).toString()); } assertFalse(stream.incrementToken()); } } } but all I can produce is this NullPointerException: java.lang.NullPointerException at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923) at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133) at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:180) at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:49) at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54) at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50) at org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(ASCIIFoldingFilter.java:71) at xyz.search.lucene.TokenFilterAnalyzerWrapperTest.testFilter(TokenFilterAnalyzerWrapperTest.java:27) StandardTokenizerImpl.java:923 is /* finally: fill the buffer with new input */ int numRead = zzReader.read(zzBuffer, zzEndRead, zzBuffer.length-zzEndRead); The "reader" is clearly the unexpectedly null value, however I cannot figure out how to set it correctly. Through experimentation, it seems that I can evade some problems by calling reset() and setReader() at various points. However I always end up at some other exception buried deep within, so I believe I am still missing some piece of the puzzle. Any help greatly appreciated! Thanks, Steven --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org