Hi All,

Our application that uses Lucene for indexing will be used to index documents that each of which contains parts written in different languages. For example some document could contain English, Chinese and Brazilian text. So how to index such document? Is there some best practice to do this?

What comes in my mind is to index 3 different Lucene Documents for the real document and keep in a database the meta info that these 3 Documents are related to our real doc. For example for the myDoc.doc we will have in the index myDocEn.doc, myDocCn.doc and myDocBr.doc and when making search when the searched word is found in myDocCn.doc we will visualize to user myDoc.doc. Disadvantage here is that in this case the occurrences of the searched item will have to be recalculated. It is important for queries like “Red NEAR/10 fox”. So if someone knows better practice than this, please let me help.

Tanks in advance,
Ivan


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to