Indexing of multilingual labels

Stephane Fellah Thu, 10 Mar 2011 19:30:03 -0800

I  am trying to index in Lucene a field that could have label of concepts in
different languages. Most of the approaches I have seen so far are:


   -

   Use a single index, where each document has a field per each language it
   uses, or
   -

   Use M indexes, M being the number of languages in the corpus.

Lucene 2.9+ has a feature called Payload that allows to attach attributes to
term. Is anyone use this mechanism to store language (or other attributes
such as datatypes) information ? Does this approach if labels are the same
in different languages (does it break inverted index) ? How is performance
compared to the two other approaches ? Any pointer on source code showing
how it is done would help.

Thanks

-- 
Stephane Fellah, M.Sc, B.Sc
Principal Engineer/Product Manager
smartRealm LLC
201 Loudoun St. SW
Leesburg, VA 20175
Tel: 703 669 5514
Cell: 571 502 8478
Fax: 703 669 5515

Indexing of multilingual labels

Reply via email to