tokenizer's tokens

Igal @ getRailo.org Thu, 01 Nov 2012 16:32:22 -0700

I'm trying to write a very simple method to show the different tokensthat come out of a tokenizer. when I call WhitespaceTokenizer's (orLetterTokenizer's) incrementToken() method though I get anArrayIndexOutOfBoundsException (see below)


any ideas?


p.s.  if I use StandardTokenizer it works.


java.lang.ArrayIndexOutOfBoundsException: -1
    at java.lang.Character.codePointAtImpl(Character.java:4739)
    at java.lang.Character.codePointAt(Character.java:4702)

atorg.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)atorg.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)

    at test.Test1.tokenize(Test1.java:46)
    at test.Test1.main(Test1.java:139)


class Test1 {

    static Version v = Version.LUCENE_40;


    static void tokenize( String s ) throws IOException {

        Reader r = new StringReader( s );

        Tokenizer t = new WhitespaceTokenizer( v, r );

CharTermAttribute attrTerm = t.getAttribute(CharTermAttribute.class );


        while ( t.incrementToken() ) {

            String term = attrTerm.toString();

            System.out.println( term );
        }
    }


    public static void main( String[] args ) throws IOException {

        String[] text = {

            "The quick brown fox jumps over the lazy dog",

"Only the fool would take trouble to verify that hissentence was composed of ten a's, three b's, four c's, four d's,forty-six e's, sixteen f's, four g's, thirteen h's, fifteen i's, twok's, nine l's, four m's, twenty-five n's, twenty-four o's, five p's,sixteen r's, forty-one s's, thirty-seven t's, ten u's, eight v's, eightw's, four x's, eleven y's, twenty-seven commas, twenty-threeapostrophes, seven hyphens and, last but not least, a single!",


        };

        for ( String s : text )
            tokenize( s );

    }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

tokenizer's tokens

Reply via email to