Hi Mike, I don’t see the script - did it get stripped?
Below is a list of the language profiles that I believe are bundled with the language-detector jar that’s pulled in by Tika. I don’t see “gr” - note that Greek is “el”. And there’s “zh-CN” and “zh-TW” vs. just “zh”, but otherwise I’d expect detection to work for your test cases. — Ken af an ar ast be bg bn br ca cs cy da de el en es et eu fa fi fr ga gl gu he hi hr ht hu id is it ja km kn ko lt lv mk ml mr ms mt ne nl no oc pa pl pt ro ru sk sl so sq sr sv sw ta te th tl tr uk ur vi yi zh-CN zh-TW > On Jan 17, 2019, at 9:39 AM, Mike Thomsen <mikerthom...@gmail.com> wrote: > > I wrote a Groovy script (attached) to test a bunch of languages against the > LanguageDetector class, and these were the results: > > ar fa > de de > en en > es es > fr fr > gr el > it it > ko lt > nl nl > ru ru > zh lt > > Is there something that needs to be done to enable the detection of Asian > languages or should I file this as a bug report? > > Thanks, > > Mike -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra