read this:
http://www.crazysquirrel.com/compgen/form-encoding.php
then in your receiving servlet:
String query_string = request.getParameter("query");
String query_string = new String(query_string.getBytes(),request.getCharacterEncoding());
then pass query_string to lucene. This ensures that the string fetched by getParameter() is encoded by the right encoding.
Hope this helps!
Mvh Karl Øie
On 11. apr. 2005, at 11.54, Eric Chow wrote:
Hello,
I am a beginner in using Lucene.
My files are contains different language (English, Chinese, Portuguese, Japanese and some Asian languages, non-latin languages). They always contain in one file. Therefore, I have to use UTF-8 to save the contents.
I am now developing a web-based search engine. I use Lucene to create index for those files and search it in web. The charset of the web page is UTF-8, but it cannot search anything.
I try to use some Analyser (CJKAnalyser, ChineseAnalyser, StandardAnalyser, SimpleAnalyser), still failed.
Finally, I tested to use original charset, for example, the Chinese contents I used BIG5, and I can search it very well. For those English, of couse, no problem.
But I can't use UTF-8 as the charset for documents. Any suggest and examples ?
Best regards, Eric
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
- ...I wonder if the really nerdy Klingons learn how to speak english?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]