Hi,

I'd like to go in details regarding issues that occurs when you want to index and search contents in multiple languages.

I have read Lucene in Action book, and many thread on this mailing list, the most interesting so far being this one:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL 
PROTECTED]

The solution choosen/recommended by Doug Cutting in this message:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/[EMAIL 
PROTECTED]
is the number '2/':
Having one index for all languages one Document per content's language with a field specify its language, and using a query filter when searching.

While I think it is a good solution:
- If you have N languages, if you search for something in 1 language, you are going to search an index N times too large. Wouldn't it be better to have N indices for N languages? That way, each index could benefit of its specialized analyser, and if you need to search in multiple languages, you just need to merge result of those differents analyzer. - If you have contents in multiple language like we do, and by that I don't mean multiple contents each one having its own language, but multiple content, each one being in many languages. You are going to have a N to 1, Document/content relation in the index. As far as update, delete, and search in multiple language are concerned, wouldn't it be simpler to alway keep a 1 to 1 Document/content relation in an index?

As you may have guess, my original thought, even before I read those thread, was that the solution number 3. might be more flexible/modular than the others, of course it also has its drawbacks: - performance issue when doing multiple language search, specially when merging results of different index.
- more complex to code
- other?

Can you clarify on this?
What solutions all of you have choosen til now regarding indexing and searching of multiple content in multiple language ?

Thanks!

Olivier



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to