ICUTokenizer and CJK

Burton-West, Tom Mon, 22 Nov 2010 15:51:24 -0800

Hi all,

I see in the javadoc for the ICUTokenizer that it has special handling for 
Lao,Myanmar, Khmer word breaking but no details in the javadoc about what it 
does with CJK, which for C and J appears to be breaking into unigrams. Is this 
correct?

Tom

ICUTokenizer and CJK

Reply via email to