Hi Otis
Thanks for the information. I'm actually writing something to search files
containing code (such as JSP files) so I do expect there will be a few
problems like this because I guess Lucene's out-of-the box analyzers are
really suited to natural languages. But, I was wondering if you could
Richard,
WhitespaceTokenizer (the tokenizer that WhitespaceAnalyzer uses) really just
tokenizes on space characters:
/** Collects only characters which do not satisfy
* [EMAIL PROTECTED] Character#isWhitespace(char)}.*/
protected boolean isTokenChar(char c) {
return !Character.isWhite