You can have multiple languages in the same index.  Just make sure that
your language identification process is consistent.

You might still get some false positives, for example, if there's a
German root that has the same letters as a French root, but means
something different, then it might still show up.  Personally, I don't
really know how many times that actually happens.

Lucene treats all _post-analyze_ tokens the same, it is pretty much
language ignorant, so as long as the UTF characters are the same, it
treats the tokens as the same.

Jeff Wang
diCarta, Inc.
-----Original Message-----
From: Lorenzo Di Gaetano [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 20, 2006 7:52 AM
To: Lucene Mailing List
Subject: [SPAM] - Indexing with SnowballAnalyzer and multiple languages
in a single index - Sender is forged (SPF Fail)

Hi all,

I'm working at the search api of a multi language CMS, and I'm using the

latest Lucene release. I'm using the SnowballAnalyzer in order to have 
stemmers for various languages. I know that I must use the same analyzer

for indexing and searching, in order to obtain correct hits, but can I 
index contents for various languages (passing the name of the language 
to the SnowballAnalyzer's constructor) into the same index and then 
searching specifiyng the language to the SnowballAnalyzer's constructor?

Or it's better to have one single index per language (in different 
directories)?

Thank you in advance.

Lorenzo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to