Hi All,
Our application that uses Lucene for indexing will be used to index
documents that each of which contains parts written in different
languages. For example some document could contain English, Chinese and
Brazilian text. So how to index such document? Is there some best
practice to do this?
What comes in my mind is to index 3 different Lucene Documents for the
real document and keep in a database the meta info that these 3
Documents are related to our real doc. For example for the myDoc.doc we
will have in the index myDocEn.doc, myDocCn.doc and myDocBr.doc and when
making search when the searched word is found in myDocCn.doc we will
visualize to user myDoc.doc. Disadvantage here is that in this case the
occurrences of the searched item will have to be recalculated. It is
important for queries like “Red NEAR/10 fox”. So if someone knows better
practice than this, please let me help.
Tanks in advance,
Ivan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]