A shameless self-promotion: http://basistech.com/language-identification/ No, it's not free. Sorry.
We have Lucene-compatible Tokenizers for those languages too: http://basistech.com/lucene/How-to-build-a-multilingual-search-engine.pdf Contact me if you have questions. -kuro > -----Original Message----- > From: Bradford Stephens [mailto:bradfordsteph...@gmail.com] > Sent: Thursday, August 06, 2009 12:46 PM > To: solr-u...@lucene.apache.org; java-user@lucene.apache.org > Subject: Language Detection for Analysis? > > Hey there, > > We're trying to add foreign language support into our new > search engine -- languages like Arabic, Farsi, and Urdu (that > don't work with standard analyzers). But our data source > doesn't tell us which languages we're actually collecting -- > we just get blocks of text. Has anyone here worked on > language detection so we can figure out what analyzers to > use? Are there commercial solutions? > > Much appreciated! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org