Hello,
I am a beginner in using Lucene. My files are contains different language (English, Chinese, Portuguese, Japanese and some Asian languages, non-latin languages). They always contain in one file. Therefore, I have to use UTF-8 to save the contents. I am now developing a web-based search engine. I use Lucene to create index for those files and search it in web. The charset of the web page is UTF-8, but it cannot search anything. I try to use some Analyser (CJKAnalyser, ChineseAnalyser, StandardAnalyser, SimpleAnalyser), still failed. Finally, I tested to use original charset, for example, the Chinese contents I used BIG5, and I can search it very well. For those English, of couse, no problem. But I can't use UTF-8 as the charset for documents. Any suggest and examples ? Best regards, Eric --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]