Re: korean and lucene

2005-10-03 Thread Cheolgoo Kang
StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot read Korean part of Unicode character blocks. You should 1) use CJKAnalyzer or 2) add Korean character block(0xAC00~0xD7AF) to the CJK token definition on the StandardTokenizer.jj file. Hope it helps. On 10/4/05, John Wang <[EMAIL PROT

Re: korean and lucene

2005-10-03 Thread Youngho Cho
Would you share what the problem is ? I used CJKAnalyzer for Korean over 2 years without any problem. ( I remembered that there was some query result problem with StandardAnalyzer at that time ) But I tring to switch to the StandardAnalyzer again. Thanks, Youngho - Original Message -

korean and lucene

2005-10-03 Thread John Wang
Hi: We are running into problems with searching on korean documents. We are using the StandardAnalyzer and everything works with Chinese and Japanese. Are there known problems with Korean with Lucene? Thanks -John

Re: TermDocs.freq()

2005-10-03 Thread Yonik Seeley
See IndexWriter.setMaxFieldLength() -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/3/05, Tricia Williams <[EMAIL PROTECTED]> wrote: > > To follow up on my post from Thursday. I have written a very basic test > for TermPositions. This test allows me to identify that only the > first 10001 tok

Re: TermDocs.freq()

2005-10-03 Thread Tricia Williams
To follow up on my post from Thursday. I have written a very basic test for TermPositions. This test allows me to identify that only the first 10001 tokens are considered to determine term frequency (ie with the searching term in a position greater than 10001 my test fails). Is this by design?