StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot read
Korean part of Unicode character blocks.
You should 1) use CJKAnalyzer or 2) add Korean character
block(0xAC00~0xD7AF) to the CJK token definition on the
StandardTokenizer.jj file.
Hope it helps.
On 10/4/05, John Wang <[EMAIL PROT
Would you share what the problem is ?
I used CJKAnalyzer for Korean over 2 years without any problem.
( I remembered that there was some query result problem with StandardAnalyzer
at that time )
But I tring to switch to the StandardAnalyzer again.
Thanks,
Youngho
- Original Message -
Hi:
We are running into problems with searching on korean documents. We are
using the StandardAnalyzer and everything works with Chinese and Japanese.
Are there known problems with Korean with Lucene?
Thanks
-John
See IndexWriter.setMaxFieldLength()
-Yonik
Now hiring -- http://tinyurl.com/7m67g
On 10/3/05, Tricia Williams <[EMAIL PROTECTED]> wrote:
>
> To follow up on my post from Thursday. I have written a very basic test
> for TermPositions. This test allows me to identify that only the
> first 10001 tok
To follow up on my post from Thursday. I have written a very basic test
for TermPositions. This test allows me to identify that only the
first 10001 tokens are considered to determine term frequency (ie with
the searching term in a position greater than 10001 my test fails).
Is this by design?