Sam Stephens created TIKA-3711:
----------------------------------

             Summary: Image file names included in parsed Word Document text
                 Key: TIKA-3711
                 URL: https://issues.apache.org/jira/browse/TIKA-3711
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 2.3.0
            Reporter: Sam Stephens
         Attachments: word-doc-with-image.docx

The attached Word document includes nothing but a single image. Running it 
through the Tika 2.2.0 AutoDetectParser correctly returns no text. Running it 
through the Tika 2.3.0 AutoDetectParser returns the text:


{{image1.png}}

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to