Multy Language documents indexing

Ivan Vasilev Thu, 22 Feb 2007 05:05:13 -0800

Hi All,

Our application that uses Lucene for indexing will be used to indexdocuments that each of which contains parts written in differentlanguages. For example some document could contain English, Chinese andBrazilian text. So how to index such document? Is there some bestpractice to do this?

What comes in my mind is to index 3 different Lucene Documents for thereal document and keep in a database the meta info that these 3Documents are related to our real doc. For example for the myDoc.doc wewill have in the index myDocEn.doc, myDocCn.doc and myDocBr.doc and whenmaking search when the searched word is found in myDocCn.doc we willvisualize to user myDoc.doc. Disadvantage here is that in this case theoccurrences of the searched item will have to be recalculated. It isimportant for queries like “Red NEAR/10 fox”. So if someone knows betterpractice than this, please let me help.


Tanks in advance,
Ivan


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Multy Language documents indexing

Reply via email to