Caleb Postlethwait created TIKA-3732:
----------------------------------------
Summary: Word doc MediaType detected as RTF
Key: TIKA-3732
URL: https://issues.apache.org/jira/browse/TIKA-3732
Project: Tika
Issue Type: Bug
Components: detector
Affects Versions: 2.2.1
Reporter: Caleb Postlethwait
Attachments: example.DOC
When executing Detector.detect(InputStream input, Metadata metadata) on a
particular Word document, we're getting back a MediaType of RTF which has some
downstream effects for us.
Here's the relevant bit of code:
TikaConfig config = TikaConfigFactory.getTikaConfig();
Detector detector = config.getDetector();
Metadata metadata = new Metadata();
stream = TikaInputStream.get(fis = new FileInputStream(paths));
metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, paths);
*MediaType mediaType = detector.detect(stream, metadata);*
Attaching the file that we came across this issue on.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)