Hi, thanks guys for helping me.
I forgot to use back the same analyzer for searching, that's why I can't search for Chinese words.. :) -----Original Message----- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 19, 2007 4:37 AM To: java-user@lucene.apache.org Subject: Re: Lucene for chinese search Hi, Karl, Thanks for sharing this experience. I did find CJKAnalyzer somehow behaves differently than ChineseAnalyzer. When trying to highlight the matched term, ChineseAnalyzer didn't work somehow. But I didn't investigate into it. This is a useful clue for it. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m inutes On 6/18/07, karl wettin <[EMAIL PROTECTED]> wrote: > A year or two ago I hacked Lucene to use UTF16 instead of UTF8 as CJK > characters are represented by 3 bytes with UTF8, and 2 bytes as > UTF16. It is a simple hack. > > It did however not save me that much as I had a mixed latin and CJK > corpus, and I reverted. Still think it is something worth > considering. Perhaps it might be worth implementing per index, per > document or per field string encoding strategy. > > > > > 18 jun 2007 kl. 20.01 skrev Chris Lu: > > > Basically where ever you see, the encoding should be utf8. > > > > The servlet also has an encoding setting. For your case, change the > > tomcat setting. > > When rendering jsp page, the encoding also matters. > > > > -- > > Chris Lu > > ------------------------- > > Instant Scalable Full-Text Search On Any Database/Application > > site: http://www.dbsight.net > > demo: http://search.dbsight.com > > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php? > > title=Create_Lucene_Database_Search_in_3_minutes > > > > On 6/18/07, Lee Li Bin <[EMAIL PROTECTED]> wrote: > >> > >> Hi, > >> > >> For indexing, there is no problem, there is Chinese text similar > >> to my > >> datasource (XML) in the index file when opening on a note pad. > >> > >> When I try to use the utf8 in jsp and, getbytes array of 'utf-8' or > >> ISO88599_1 or Cp1252 in Java servlet, but we getting search > >> problem, the > >> search result does not display for Chinese term. > >> > >> I mixed English and Chinese text in my datasource, the search is > >> working for > >> English term, and Chinese char display as '???' in the result output. > >> > >> Please advice or send some sample / solutions > >> > >> Thanks. > >> > >> -----Original Message----- > >> From: Mathieu Lecarme [mailto:[EMAIL PROTECTED] > >> Sent: Monday, June 18, 2007 8:58 PM > >> To: java-user@lucene.apache.org > >> Subject: Re: Lucene for chinese search > >> > >> Lee Li Bin a écrit : > >> > Hi, > >> > > >> > I still met problem for searching of Chinese words. > >> > XMl file which is the datasource and analyzer has already been > >> encoded. > >> > Have testing on StandardAnalyzer, CJKAnalyzer, and > >> ChineseAnalyzer, but it > >> > still can't get any results. > >> > > >> > 1. do we need any encoding configuration in apache tomcat for > >> Chinese > >> > search using Lucence > >> > > >> > 2. do we need to use JSP meta / page encoding ? what is the > >> encoding > >> > for jsp? > >> > > >> try first with simple junit test, after you can fight with UTF8 > >> parameters. > >> > >> M. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]