[ https://issues.apache.org/jira/browse/TIKA-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320264#comment-17320264 ]
Tim Allison commented on TIKA-3340: ----------------------------------- I wound up bringing in more languages from: http://data.statmt.org/cc-100/ I documented my process here: https://github.com/apache/tika/blob/main/tika-langdetect/tika-langdetect-opennlp/src/main/java/org/apache/tika/langdetect/opennlp/OpenNLPDetector.java I'll push the commit with the updated common tokens files shortly, once I get a clean build locally. > LanguageProfile for Myanmar > --------------------------- > > Key: TIKA-3340 > URL: https://issues.apache.org/jira/browse/TIKA-3340 > Project: Tika > Issue Type: Improvement > Components: languageidentifier > Reporter: Arky > Priority: Major > Attachments: 20210401-model.report.txt, 20210413.report.txt, > table-summarized-truncated.txt.gz > > > A language profile for detecting Myanmar/Burmese (my). -- This message was sent by Atlassian Jira (v8.3.4#803005)