thank you :)

On 11/1/2012 4:45 PM, Robert Muir wrote:
this is intentional (since you have a bug in your code).

you need to call reset(): see the tokenstream contract, step 2:
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html

On Thu, Nov 1, 2012 at 7:31 PM, Igal @ getRailo.org <i...@getrailo.org> wrote:
I'm trying to write a very simple method to show the different tokens that
come out of a tokenizer.  when I call WhitespaceTokenizer's (or
LetterTokenizer's) incrementToken() method though I get an
ArrayIndexOutOfBoundsException (see below)

any ideas?

p.s.  if I use StandardTokenizer it works.


java.lang.ArrayIndexOutOfBoundsException: -1
     at java.lang.Character.codePointAtImpl(Character.java:4739)
     at java.lang.Character.codePointAt(Character.java:4702)
     at
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
     at
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
     at test.Test1.tokenize(Test1.java:46)
     at test.Test1.main(Test1.java:139)


class Test1 {

     static Version v = Version.LUCENE_40;


     static void tokenize( String s ) throws IOException {

         Reader r = new StringReader( s );

         Tokenizer t = new WhitespaceTokenizer( v, r );

         CharTermAttribute   attrTerm = t.getAttribute(
CharTermAttribute.class );

         while ( t.incrementToken() ) {

             String term = attrTerm.toString();

             System.out.println( term );
         }
     }


     public static void main( String[] args ) throws IOException {

         String[] text = {

             "The quick brown fox jumps over the lazy dog",
             "Only the fool would take trouble to verify that his sentence
was composed of ten a's, three b's, four c's, four d's, forty-six e's,
sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four
m's, twenty-five n's, twenty-four o's, five p's, sixteen r's, forty-one s's,
thirty-seven t's, ten u's, eight v's, eight w's, four x's, eleven y's,
twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but
not least, a single!",

         };

         for ( String s : text )
             tokenize( s );

     }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to