Caleb Postlethwait created TIKA-3732:
----------------------------------------

             Summary: Word doc MediaType detected as RTF
                 Key: TIKA-3732
                 URL: https://issues.apache.org/jira/browse/TIKA-3732
             Project: Tika
          Issue Type: Bug
          Components: detector
    Affects Versions: 2.2.1
            Reporter: Caleb Postlethwait
         Attachments: example.DOC

When executing Detector.detect(InputStream input, Metadata metadata) on a 
particular Word document, we're getting back a MediaType of RTF which has some 
downstream effects for us.
Here's the relevant bit of code:



TikaConfig config = TikaConfigFactory.getTikaConfig();
Detector detector = config.getDetector();
Metadata metadata = new Metadata();
stream = TikaInputStream.get(fis = new FileInputStream(paths));
metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, paths);
*MediaType mediaType = detector.detect(stream, metadata);*





Attaching the file that we came across this issue on.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to