It's not so much a matter of problems with indexing/searching as it is with search behavior. The reason these strategies are implemented is that using English stemming, say, on other languages will produce "interesting" results.
There's no a-priori reason you can't index multiple languages in the same field. So I don't see what you would accomplish by using payloads to indicate which language the term is in. Could you expand a bit on what you're trying to accomplish here? Maybe there are better solutions.... Best Erick On Thu, Mar 10, 2011 at 10:29 PM, Stephane Fellah <sfel...@smartrealm.com> wrote: > I am trying to index in Lucene a field that could have label of concepts in > different languages. Most of the approaches I have seen so far are: > > - > > Use a single index, where each document has a field per each language it > uses, or > - > > Use M indexes, M being the number of languages in the corpus. > > Lucene 2.9+ has a feature called Payload that allows to attach attributes to > term. Is anyone use this mechanism to store language (or other attributes > such as datatypes) information ? Does this approach if labels are the same > in different languages (does it break inverted index) ? How is performance > compared to the two other approaches ? Any pointer on source code showing > how it is done would help. > > Thanks > > -- > Stephane Fellah, M.Sc, B.Sc > Principal Engineer/Product Manager > smartRealm LLC > 201 Loudoun St. SW > Leesburg, VA 20175 > Tel: 703 669 5514 > Cell: 571 502 8478 > Fax: 703 669 5515 > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org