> > I'm indexing documents, and some of them are provided in several > > languages. ... Either, I create > > languages specific field, either I index the translations in different > > documents, adding the language field. > > > > I choose the second solution, because first, the translated documents will > > not be the majority of documents that I need to index, second is that at > > search time, if I don't want to restrict the search to one language, with > > solution one, I have a query with potentially lot of fields to cover all > > languages. Also, the second option makes it faster to filter the results by > > language, if specified. > > > > However, with this solution, when the query is not filtered by a language > > and that the user search for fields common to any language, such as author > > for instance, I will have as much results as I have translations.
If space can be afforded, perhaps a simple setting is: one Lucene doc per "page", with N+1 fields: one per each existing translation for the page, and an additional ALL field == union of all the translations of the page. Then, if only language L is requested, search in field L only; if there is no language specification and the user locale is unknown, search in the ALL field; and if there is no language specification but the user locale is known to be L, search in both L and ALL, optionally boost the L part of the query. HTH, Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]