Thomas: There are some rather extensive threads on this list about the "interesting" issues that exist when indexing/searching other languages. I think you'd find it worthwhile to search the list archive for foreign language or some such...
The short answer as I remember is that there *is* a built-in stemmer, but whether it does what you want when indexing multiple languages depends upon what results you expect to get...and there's no clear answer that I remember.... Erick On 11/18/06, Thomas Klein <[EMAIL PROTECTED]> wrote:
Hi there, I'm fairly new to lucene, I just developped a multi threaded indexing tcp server using lucene to hmmm, let me remember, index stuffs :) I have to index not only english, but french and german, and, I don't know, perhaps other languages in the future. Did lucene use a default stemmer or do I have to stem the texts before indexing ? Does a multi-language stemmer exists ? (sorry if the answers are in the documentation, I didn't manage to fully read it) Thanks in advance ! Thomas Klein. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]