Re: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Daniel Noll
Steven A Rowe wrote: Korean has been treated differently from Chinese and Japanese since LUCENE-461 . The grouping of Hangul with digits was introduced in this issue. Certainly I found LUCENE-461 during my search, and certainly grouping togeth

RE: StandardTokenizer and Korean grouping with alphanum

2008-09-22 Thread Steven A Rowe
Hi Daniel, On 09/22/2008 at 12:49 AM, Daniel Noll wrote: > I have a question about Korean tokenisation. Currently there > is a rule in StandardTokenizerImpl.jflex which looks like this: > > ALPHANUM = ({LETTER}|{DIGIT}|{KOREAN})+ LUCENE-1126

StandardTokenizer and Korean grouping with alphanum

2008-09-21 Thread Daniel Noll
Hi all. I have a question about Korean tokenisation. Currently there is a rule in StandardTokenizerImpl.jflex which looks like this: ALPHANUM = ({LETTER}|{DIGIT}|{KOREAN})+ I'm wondering if there was some good reason why it isn't: ALPHANUM = (({LETTER}|{DIGIT})+|{KOREAN}+) Basically I'