[
https://issues.apache.org/jira/browse/TIKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562661#comment-17562661
]
Giorgiana Ciobanu commented on TIKA-3811:
-----------------------------------------
Thanks [~tallison] , I will use detection with input stream as you recommended.
Yes, for a future version, it would be great to not include the file name
detection in mime types and have that in a separate Detector added to the
default one , maybe?
Thanks [~nick] for forwarding this to [~tallison] .
> Exclude NameDetector not working for Tika.detect(file)
> ------------------------------------------------------
>
> Key: TIKA-3811
> URL: https://issues.apache.org/jira/browse/TIKA-3811
> Project: Tika
> Issue Type: Bug
> Components: config, core, detector
> Affects Versions: 2.3.0
> Reporter: Giorgiana Ciobanu
> Priority: Major
> Attachments: invalid_format.vtt, tika-config_test.xml
>
>
> I need to detect mime type for a file but for security reason I want to
> exclude the detection by file name extension.
> I added a tika-config_test.xml (see attached) to my unit test but it still
> detects file by name extension.
> I attached a test file that is wrongly detected as text/vtt because of the
> file extension, it should be text/plain in this case.
>
> The code of my unit test:
> {code:java}
> File file = new
> File(getClass().getClassLoader().getResource("invalid_format.vtt").getFile());
> TikaConfig tikaConfig = new TikaConfig(this.getClass()
> .getClassLoader()
> .getResourceAsStream("tika-config_test.xml"));
>
> // returns text/vtt but should be text/plain
> String mimeType = new Tika(tikaConfig).detect(file);
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)