You can have multiple languages in the same index. Just make sure that your language identification process is consistent.
You might still get some false positives, for example, if there's a German root that has the same letters as a French root, but means something different, then it might still show up. Personally, I don't really know how many times that actually happens. Lucene treats all _post-analyze_ tokens the same, it is pretty much language ignorant, so as long as the UTF characters are the same, it treats the tokens as the same. Jeff Wang diCarta, Inc. -----Original Message----- From: Lorenzo Di Gaetano [mailto:[EMAIL PROTECTED] Sent: Thursday, April 20, 2006 7:52 AM To: Lucene Mailing List Subject: [SPAM] - Indexing with SnowballAnalyzer and multiple languages in a single index - Sender is forged (SPF Fail) Hi all, I'm working at the search api of a multi language CMS, and I'm using the latest Lucene release. I'm using the SnowballAnalyzer in order to have stemmers for various languages. I know that I must use the same analyzer for indexing and searching, in order to obtain correct hits, but can I index contents for various languages (passing the name of the language to the SnowballAnalyzer's constructor) into the same index and then searching specifiyng the language to the SnowballAnalyzer's constructor? Or it's better to have one single index per language (in different directories)? Thank you in advance. Lorenzo --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]