> > I'm indexing documents, and some of them are provided in several
> > languages.   ...   Either, I create
> > languages specific field, either I index the translations in different
> > documents, adding the language field.
> >
> > I choose the second solution, because first, the translated documents
will
> > not be the majority of documents that I need to index, second is that
at
> > search time, if I don't want to restrict the search to one language,
with
> > solution one, I have a query with potentially lot of fields to cover
all
> > languages. Also, the second option makes it faster to filter the
results by
> > language, if specified.
> >
> > However, with this solution, when the query is not filtered by a
language
> > and that the user search for fields common to any language, such as
author
> > for instance, I will have as much results as I have translations.

If space can be afforded, perhaps a simple setting is: one Lucene doc per
"page", with N+1 fields: one per each existing translation for the page,
and an additional ALL field == union of all the translations of the page.
Then, if only language L is requested, search in field L only; if there is
no language specification and the user locale is unknown, search in the ALL
field; and if there is no language specification but the user locale is
known to be L, search in both L and ALL, optionally boost the L part of the
query.

HTH, Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to