[ 
https://issues.apache.org/jira/browse/TIKA-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320264#comment-17320264
 ] 

Tim Allison commented on TIKA-3340:
-----------------------------------

I wound up bringing in more languages from: http://data.statmt.org/cc-100/

I documented my process here: 
https://github.com/apache/tika/blob/main/tika-langdetect/tika-langdetect-opennlp/src/main/java/org/apache/tika/langdetect/opennlp/OpenNLPDetector.java

I'll push the commit with the updated common tokens files shortly, once I get a 
clean build locally.

> LanguageProfile for Myanmar
> ---------------------------
>
>                 Key: TIKA-3340
>                 URL: https://issues.apache.org/jira/browse/TIKA-3340
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>            Reporter: Arky
>            Priority: Major
>         Attachments: 20210401-model.report.txt, 20210413.report.txt, 
> table-summarized-truncated.txt.gz
>
>
> A language profile for detecting Myanmar/Burmese (my).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to