[ 
https://issues.apache.org/jira/browse/TIKA-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923606#comment-17923606
 ] 

Subbu commented on TIKA-4370:
-----------------------------

_Unless I misunderstand your point, under the hood in Tika, that's the problem._
Sorry, I didn't knew that. Does all detection go through TextDetector first to 
determine whether they are textual files? 

We disabled usage of file name metadata, and started seeing this issue in txt, 
csv, and also believe in xls/pdf files. 


_There's no mime magic or BOM for shift-jis text files?_
I think they are not any for ShiftJIS in 
[https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml]

I checked for text/plain alone.

[~tallison] 
 
 

> SJIS Encoded Files Can't be Detected
> ------------------------------------
>
>                 Key: TIKA-4370
>                 URL: https://issues.apache.org/jira/browse/TIKA-4370
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Subbu
>            Priority: Major
>
> When character encoding of file is SJIS, without file name in the metadata, 
> most files content-type detected as application/octet-stream. Is there zero 
> support for SJIS? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to