this is intentional (since you have a bug in your code). you need to call reset(): see the tokenstream contract, step 2: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html
On Thu, Nov 1, 2012 at 7:31 PM, Igal @ getRailo.org <i...@getrailo.org> wrote: > I'm trying to write a very simple method to show the different tokens that > come out of a tokenizer. when I call WhitespaceTokenizer's (or > LetterTokenizer's) incrementToken() method though I get an > ArrayIndexOutOfBoundsException (see below) > > any ideas? > > p.s. if I use StandardTokenizer it works. > > > java.lang.ArrayIndexOutOfBoundsException: -1 > at java.lang.Character.codePointAtImpl(Character.java:4739) > at java.lang.Character.codePointAt(Character.java:4702) > at > org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164) > at > org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166) > at test.Test1.tokenize(Test1.java:46) > at test.Test1.main(Test1.java:139) > > > class Test1 { > > static Version v = Version.LUCENE_40; > > > static void tokenize( String s ) throws IOException { > > Reader r = new StringReader( s ); > > Tokenizer t = new WhitespaceTokenizer( v, r ); > > CharTermAttribute attrTerm = t.getAttribute( > CharTermAttribute.class ); > > while ( t.incrementToken() ) { > > String term = attrTerm.toString(); > > System.out.println( term ); > } > } > > > public static void main( String[] args ) throws IOException { > > String[] text = { > > "The quick brown fox jumps over the lazy dog", > "Only the fool would take trouble to verify that his sentence > was composed of ten a's, three b's, four c's, four d's, forty-six e's, > sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four > m's, twenty-five n's, twenty-four o's, five p's, sixteen r's, forty-one s's, > thirty-seven t's, ten u's, eight v's, eight w's, four x's, eleven y's, > twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but > not least, a single!", > > }; > > for ( String s : text ) > tokenize( s ); > > } > > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org