Hi,
I'd like to go in details regarding issues that occurs when you want to
index and search contents in multiple languages.
I have read Lucene in Action book, and many thread on this mailing list,
the most interesting so far being this one:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL
PROTECTED]
The solution choosen/recommended by Doug Cutting in this message:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/[EMAIL
PROTECTED]
is the number '2/':
Having one index for all languages one Document per content's language
with a field specify its language, and using a query filter when searching.
While I think it is a good solution:
- If you have N languages, if you search for something in 1 language,
you are going to search an index N times too large.
Wouldn't it be better to have N indices for N languages? That way, each
index could benefit of its specialized analyser, and if you need to
search in multiple languages, you just need to merge result of those
differents analyzer.
- If you have contents in multiple language like we do, and by that I
don't mean multiple contents each one having its own language, but
multiple content, each one being in many languages. You are going to
have a N to 1, Document/content relation in the index.
As far as update, delete, and search in multiple language are concerned,
wouldn't it be simpler to alway keep a 1 to 1 Document/content relation
in an index?
As you may have guess, my original thought, even before I read those
thread, was that the solution number 3. might be more flexible/modular
than the others, of course it also has its drawbacks:
- performance issue when doing multiple language search, specially when
merging results of different index.
- more complex to code
- other?
Can you clarify on this?
What solutions all of you have choosen til now regarding indexing and
searching of multiple content in multiple language ?
Thanks!
Olivier
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]