Hi Jan,

It probably makes sense to provide pluggable language detection in Tika, since 
it's the lower level library, 
so I am +1 for figuring out a solution to implement it in Tika ville.

If no one has started on this in the next few weeks I'll give it a go.

Cheers,
Chris

On Apr 8, 2012, at 4:16 PM, Jan Høydahl wrote:

> In Solr, we made support for pluggable lang detectors, one being Tika's. See 
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/langid/
> The detectLanguage() method returns a list of DetectedLanguage objects with a 
> normalized certainty between 0.0 and 1.0. Think it's a step in right 
> direction.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
[...snip...]

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to