[ https://issues.apache.org/jira/browse/TIKA-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928854#comment-17928854 ]
Tim Allison commented on TIKA-4370: ----------------------------------- I looked around some more. As mentioned, the current text detector is very tightly coupled with the other steps within MimeTypes, and MimeTypes, itself, is hardcoded to be the last detector. This code goes back to well before I was on the project, and it causes me great concern to make the serious rewrites it would take to modularize the logic in MimeTypes. That said, fixing this for shift_jis would also fix utf-16 and probably several other charsets. I'll try to take another look tomorrow. > SJIS Encoded Files Can't be Detected > ------------------------------------ > > Key: TIKA-4370 > URL: https://issues.apache.org/jira/browse/TIKA-4370 > Project: Tika > Issue Type: Bug > Reporter: Subbu > Priority: Major > > When character encoding of file is SJIS, without file name in the metadata, > most files content-type detected as application/octet-stream. Is there zero > support for SJIS? -- This message was sent by Atlassian Jira (v8.20.10#820010)