[ https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045312#comment-13045312 ]
Jan Høydahl commented on TIKA-568: ---------------------------------- Or perhaps better, a getProbability() method returning the probability of the language being accurate, with 1.0 being 100% probable. It's easier for clients to relate to a scale between 0.0-1.0 and then we also hide the implementation detail of the distance algorithm. > Language Detection isReasonablyCertain() hides valuable information > ------------------------------------------------------------------- > > Key: TIKA-568 > URL: https://issues.apache.org/jira/browse/TIKA-568 > Project: Tika > Issue Type: Improvement > Reporter: Grant Ingersoll > Priority: Minor > Attachments: TIKA-568.patch > > > LanguageIdentifier.isReasonablyCertain() hardcodes a threshold for language > detection, which is fine, except applications should be allowed to decide > what threshold suits them. For instance, how was 0.022 decided? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira