Hi,

I still met problem for searching of Chinese words.
XMl file which is the datasource and analyzer has already been encoded.
Have testing on StandardAnalyzer, CJKAnalyzer, and ChineseAnalyzer, but it
still can't get any results.

1.      do we need any encoding configuration in apache tomcat for Chinese
search using Lucence 

2.      do we need to use JSP meta / page encoding ? what is the encoding
for     jsp?


 
Regards,
Lee Li Bin

-----Original Message-----
From: Chris Lu [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 18, 2007 2:10 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene for chinese search

There are three things to watch out for chinese or CJK languages:

1. The content source or database need to be encoded in UTF-8.
2. StandardAnalyzer doesn't support chinese words well. Use either
ChineseAnalyzer or CJKAnalyzer. My experience is that CJKAnalyzer is a
little better.
3. The user's query should be encoded in UTF-8.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
inutes


On 6/17/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I would like to know whether Standard Analyzer allows searching of chinese
> words?
>
> And in order to support chinese searching, is there any encoding needed in
> order to develop the application?
>
> I'm currently using Jetty as web server, jsp as application, and search
> results will be saved in xml file and display it using xsl. So is there
> encoding needed for any of the files (xml, xsl, etc...) as well as during
> parsing of query?
>
> thanks alot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to