I meant to mention that my algo only works for alphabetic languages (which are the ones that give a harder time anyway?) and one issue that I wonder about regarding tika
tika.apache.org/1.2/api/org/apache/tika/language/LanguageIdentifier.html
is that you don't see an:
.isAlphabetic() {true, false}
test as part of the API
lbrtchx
