A shameless self-promotion:
http://basistech.com/language-identification/
No, it's not free. Sorry.

We have Lucene-compatible Tokenizers for those languages too:
http://basistech.com/lucene/How-to-build-a-multilingual-search-engine.pdf

Contact me if you have questions.
-kuro  

> -----Original Message-----
> From: Bradford Stephens [mailto:bradfordsteph...@gmail.com] 
> Sent: Thursday, August 06, 2009 12:46 PM
> To: solr-u...@lucene.apache.org; java-user@lucene.apache.org
> Subject: Language Detection for Analysis?
> 
> Hey there,
> 
> We're trying to add foreign language support into our new 
> search engine -- languages like Arabic, Farsi, and Urdu (that 
> don't work with standard analyzers). But our data source 
> doesn't tell us which languages we're actually collecting -- 
> we just get blocks of text. Has anyone here worked on 
> language detection so we can figure out what analyzers to 
> use? Are there commercial solutions?
> 
> Much appreciated!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to