Quoting Timo Sirainen <t...@iki.fi>:
1. Support for multiple languages. Use textcat while indexing to guess the language of the indexed data.
FWIW, you could probably use the Content-Language header (if it exists) to at least give a hint. No guarantee it is correct, but it's a better starting place than simply scanning all languages.
And, for that matter, you could leverage Accept-Language also (again, if it exists). Which might be more useful, since it lists all the languages the user recognizes.
michael