RE: Bug in CJKTokenizer

2008-07-18 Thread Scott Smith
July 18, 2008 3:29 PM To: java-user@lucene.apache.org Subject: RE: Bug in CJKTokenizer Hi Scott, I think this sounds reasonable, but why not also add LATIN_EXTENDED_B and LATIN_EXTENDED_ADDITIONAL? AFAICT, among other things, these cover some eastern European languages and Vietnamese, respect

RE: Bug in CJKTokenizer

2008-07-18 Thread Steven A Rowe
Hi Scott, I think this sounds reasonable, but why not also add LATIN_EXTENDED_B and LATIN_EXTENDED_ADDITIONAL? AFAICT, among other things, these cover some eastern European languages and Vietnamese, respectively. Steve On 07/18/2008 at 5:03 PM, Scott Smith wrote: > org.apache.lucene.analysis.