The length of token has to be shorter than 255, otherwise there will be unpredictable behaviors for this tokenizer. I see 255 is set as a private final in the src code, but there is no documentation to explicitly address that. Can we either make that number configurable (if not an option, I'd like to know why), or put some notes to its java doc? I had a hard time to figure that out...
- WhiteSpaceTokenizer Sheng
- Re: WhiteSpaceTokenizer Jack Krupansky
- Re: WhiteSpaceTokenizer Sheng
- Re: WhiteSpaceTokenizer Jack Krupansky