Hi Daniel, What makes you say that about language detection? Wouldn't that depend on the language detection approach or tool one uses and on the type and amount of content one trains language detector on? And what is the threshold for "reliable enough" that you have in mind?
Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Daniel Noll <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Thursday, June 5, 2008 7:36:11 PM > Subject: Re: How international languages are supported in Lucene > > > But basically consider why this must be so, especially when > > stemming. Languages are so variable that you'd get wildly > > different (and inappropriate) results if you tried to analyze them > > with the same analyzer. Especially when you get different > > language encodings in the document. > > Well... technically encoding is out of the scope of Lucene since we're > passing > in a Reader. > > I have to say though, analysing with the most naive analyser possible (the > default one with no stop words and no stemming) works well enough. > > Language detection isn't at a point where it's reliable enough to use to > determine which analyser to use automatically. > > Daniel > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]