Hi Jan,
It probably makes sense to provide pluggable language detection in Tika, since
it's the lower level library,
so I am +1 for figuring out a solution to implement it in Tika ville.
If no one has started on this in the next few weeks I'll give it a go.
Cheers,
Chris
On Apr 8, 2012, at 4:
In Solr, we made support for pluggable lang detectors, one being Tika's. See
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/langid/
The detectLanguage() method returns a list of DetectedLanguage objects with a
normalized certainty between 0.0 and 1.0. Think it's a step in right direct