Thanks, Jack. I haven't added myself to the contributor list yet, will do that and then login and comment on that ticket. One quick comment: wouldn't it be more reasonable to throw exception it a token length is more than 255, if relaxing that limit is still debatable? This way user would know immediately something is wrong.
On Friday, August 15, 2014, Jack Krupansky <j...@basetechnology.com> wrote: > Yeah, it should be documented better, and configurable. > > Some discussion of related issues here: > https://issues.apache.org/jira/browse/LUCENE-1118 > https://issues.apache.org/jira/browse/SOLR-4148 > > I actually filed a Jira for this already. No action so far, but PLEASE > feel free to comment on it: > https://issues.apache.org/jira/browse/LUCENE-5785 > > -- Jack Krupansky > > -----Original Message----- From: Sheng > Sent: Thursday, August 14, 2014 11:38 PM > To: java-user@lucene.apache.org > Subject: WhiteSpaceTokenizer > > The length of token has to be shorter than 255, otherwise there will > be unpredictable behaviors for this tokenizer. I see 255 is set as a > private final in the src code, but there is no documentation to explicitly > address that. Can we either make that number configurable (if not an > option, I'd like to know why), or put some notes to its java doc? I had a > hard time to figure that out... > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >