Hi, I just find this blog post from Mike McCandless about Google's Compact Language Detection code used in Chrome : http://blog.mikemccandless.com/2011/10/language-detection-with-googles-compact.html
There's probably some interesting things to explore in the Google Code in order to improve Tika's Language Detection. Did someone allready take a look at Google CLD code ? http://src.chromium.org/viewvc/chrome/trunk/src/third_party/cld/ Best regards Jérôme -- @jcharron http://motre.ch/ http://jcharron.posterous.com/ http://www.shopreflex.fr/ http://www.staragora.com/ <http://feeds.feedburner.com/~r/Bligblagblog/~6/1>