RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi
> Sent: Tuesday, September 06, 2005 7:22 PM > To: java-user@lucene.apache.org > Subject: RE: Highlighter apply to Japanese > > > Try change TokenGroup.isDistinct(); > > Maybe the offset test code should be >= rather than > > ie > > boolean isDistinct(

RE: Highlighter apply to Japanese

2005-09-06 Thread mark harwood
Try change TokenGroup.isDistinct(); Maybe the offset test code should be >= rather than > ie boolean isDistinct(Token token) { return token.startOffset()>=endOffset; } I've just tried the change with the Junit test and all seems well still with the non CJK

RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi
org > Subject: Re: Highlighter apply to Japanese > > > Hi, Koji, > > I had the same problem as you. This is because CJK's n-gram analysis > is different from single character's. > > My get around is to use CJKHighlighter and > CJKHighlightAnalyzer in sandbox. >

RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi
tokens to me. Any thoughts? Koji > -Original Message- > From: markharw00d [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 06, 2005 3:37 PM > To: java-user@lucene.apache.org > Subject: Re: Highlighter apply to Japanese > > > I don't know the behaviour

Re: Highlighter apply to Japanese

2005-09-05 Thread Chris Lu
Hi, Koji, I had the same problem as you. This is because CJK's n-gram analysis is different from single character's. My get around is to use CJKHighlighter and CJKHighlightAnalyzer in sandbox. -- Chris Lu Lucene Search RAD on Any Database http://www.dbsight.net On 9/5/05, Koji Se

Re: Highlighter apply to Japanese

2005-09-05 Thread markharw00d
I don't know the behaviour of the Japanese Analyzer you are using. Can you add to your example diagnosis the Token.getPositionIncrement, Token.startOffset and Token.endOffset for each of the tokens? The highlighter groups tokens with overlapping start and end offsets into a single TokenGroup f