I'm trying to write a very simple method to show the different tokens that come out of a tokenizer. when I call WhitespaceTokenizer's (or LetterTokenizer's) incrementToken() method though I get an ArrayIndexOutOfBoundsException (see below)

any ideas?

p.s.  if I use StandardTokenizer it works.


java.lang.ArrayIndexOutOfBoundsException: -1
    at java.lang.Character.codePointAtImpl(Character.java:4739)
    at java.lang.Character.codePointAt(Character.java:4702)
at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164) at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
    at test.Test1.tokenize(Test1.java:46)
    at test.Test1.main(Test1.java:139)


class Test1 {

    static Version v = Version.LUCENE_40;


    static void tokenize( String s ) throws IOException {

        Reader r = new StringReader( s );

        Tokenizer t = new WhitespaceTokenizer( v, r );

CharTermAttribute attrTerm = t.getAttribute( CharTermAttribute.class );

        while ( t.incrementToken() ) {

            String term = attrTerm.toString();

            System.out.println( term );
        }
    }


    public static void main( String[] args ) throws IOException {

        String[] text = {

            "The quick brown fox jumps over the lazy dog",
"Only the fool would take trouble to verify that his sentence was composed of ten a's, three b's, four c's, four d's, forty-six e's, sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four m's, twenty-five n's, twenty-four o's, five p's, sixteen r's, forty-one s's, thirty-seven t's, ten u's, eight v's, eight w's, four x's, eleven y's, twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but not least, a single!",

        };

        for ( String s : text )
            tokenize( s );

    }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to