Grant, Thanks for the timely reply. :-)
No, we do not have a specific language in mind. Basically, our document source could potentially contain any language in the world. Supporting English, Spanish, Italian, French, Chinese, Russian and Japanese would be the minimum set. Do you mean we will need different analyzer for each language? Then is that means we will need to know the language type of a document before we can index it? Thanks again. -----Original Message----- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, June 05, 2008 8:53 AM To: java-user@lucene.apache.org Subject: Re: How international languages are supported in Lucene Hi Michael, That's a pretty open ended question and, I'm assuming, by "international languages" you mean non-English :-). You might get some mileage out of http://wiki.apache.org/lucene-java/IndexingOtherLanguages but it is a bit out of date (namely the sandbox references). Lucene indexes non-English languages just like it does English. You need to figure out what Analyzer you need (have a look in the contrib/ Analyzers code/javadocs for many existing languages) and then pretty much everything else is the same. Namely, the same principals apply (what to store, index, etc.), as they do in English. Did you have something specific in mind? i.e. how to handle Chinese or some specific language? Lastly, if you do have a language in mind, try searching the mail archives for the name of that language. HTH, Grant On Jun 5, 2008, at 11:32 AM, Michael Siu wrote: > Would someone tell me how Lucene supports indexing and searching > documents > that contain international languages? What do I need to do in > additions to > using the StandardAnalyzer? > > > > Thanks. > > > > > > > -------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]