Hi! I try to use WhitespaceAnalyzer from Lucene 4.0 for splitting strings to words. I wrote smal test: @Test public void whitespaceAnalyzerTest() throws IOException { String string = "sdfdsf sdfsdf sd sdf "; Analyzer wa = new WhitespaceAnalyzer(Version.LUCENE_40); TokenStream tokenStream = wa.tokenStream("", new StringReader(string)); while (tokenStream.incrementToken()) { System.out.println(tokenStream.getAttribute(CharTermAttribute.class).toString()); } }
but got exception: java.lang.ArrayIndexOutOfBoundsException: -1 at java.lang.Character.codePointAtImpl(Character.java:2405) at java.lang.Character.codePointAt(Character.java:2369) at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164) at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166) at com.maxx.tests.lucene40test.analyzer.AnalyzerTest.whitespaceAnalyzerTest(AnalyzerTest.java:93) ... If I change WhitespaceAnalyzer to StandardAnalyzer it work correctly. For workaround I can create StandardAnalyzer without stopwords, but why my code doesn’t work? -- Krasovskiy Maxim