Regarding point #2, in case none of those work for you for some reason, you could always try using this:
$ ll analyzers/src/java/org/apache/lucene/analysis/ngram/ total 48 -rw-rw-r-- 1 otis otis 4934 Mar 2 16:32 EdgeNGramTokenFilter.java -rw-rw-r-- 1 otis otis 4617 Feb 21 15:33 EdgeNGramTokenizer.java -rw-rw-r-- 1 otis otis 3257 Mar 2 17:12 NGramTokenFilter.java -rw-rw-r-- 1 otis otis 3103 Mar 2 16:33 NGramTokenizer.java drwxrwxr-x 7 otis otis 4096 May 31 10:11 .svn/ Otis -- Lucene Consulting -- http://lucene-consulting.com/ ----- Original Message ---- From: Chris Lu <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, June 17, 2007 8:09:30 PM Subject: Re: Lucene for chinese search There are three things to watch out for chinese or CJK languages: 1. The content source or database need to be encoded in UTF-8. 2. StandardAnalyzer doesn't support chinese words well. Use either ChineseAnalyzer or CJKAnalyzer. My experience is that CJKAnalyzer is a little better. 3. The user's query should be encoded in UTF-8. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes On 6/17/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to know whether Standard Analyzer allows searching of chinese > words? > > And in order to support chinese searching, is there any encoding needed in > order to develop the application? > > I'm currently using Jetty as web server, jsp as application, and search > results will be saved in xml file and display it using xsl. So is there > encoding needed for any of the files (xml, xsl, etc...) as well as during > parsing of query? > > thanks alot > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]