hi Maxim , you need to reset the tokenStream before the while loop - tokenStream .reset ()
check out http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html look under "invoking the analyzer" : "ts.reset(); // Resets this stream to the beginning. (Required)" Alon On Tue, Jan 15, 2013 at 1:28 PM, Maksym Krasovskiy <m...@ciklum.com> wrote: > Hi! > I try to use WhitespaceAnalyzer from Lucene 4.0 for splitting strings to > words. > I wrote smal test: > @Test > public void whitespaceAnalyzerTest() throws IOException { > String string = "sdfdsf sdfsdf sd sdf "; > Analyzer wa = new WhitespaceAnalyzer(Version.LUCENE_40); > TokenStream tokenStream = wa.tokenStream("", new StringReader(string)); > while (tokenStream.incrementToken()) { > > System.out.println(tokenStream.getAttribute(CharTermAttribute.class).toString()); > } > } > > but got exception: > java.lang.ArrayIndexOutOfBoundsException: -1 > at java.lang.Character.codePointAtImpl(Character.java:2405) > at java.lang.Character.codePointAt(Character.java:2369) > at > org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164) > at > org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166) > at > com.maxx.tests.lucene40test.analyzer.AnalyzerTest.whitespaceAnalyzerTest(AnalyzerTest.java:93) > ... > > > If I change WhitespaceAnalyzer to StandardAnalyzer it work correctly. > For workaround I can create StandardAnalyzer without stopwords, but why > my code doesn’t work? > > > > -- > Krasovskiy Maxim >