Hi Jan, It probably makes sense to provide pluggable language detection in Tika, since it's the lower level library, so I am +1 for figuring out a solution to implement it in Tika ville.
If no one has started on this in the next few weeks I'll give it a go. Cheers, Chris On Apr 8, 2012, at 4:16 PM, Jan Høydahl wrote: > In Solr, we made support for pluggable lang detectors, one being Tika's. See > http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/langid/ > The detectLanguage() method returns a list of DetectedLanguage objects with a > normalized certainty between 0.0 and 1.0. Think it's a step in right > direction. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > [...snip...] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++