[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519749#comment-17519749
]
Tim Allison commented on TIKA-3711:
-----------------------------------
I'm going to address this in two commits. The first will add configurability
for writing the file names to streams. The second commit will be a review of
the offending commit:
https://github.com/apache/tika/commit/118734a1317fa13ad66959fdc28969ca50a49643
-- I need to review cases where the calling parser has already written xhtml
tags.
> Image file names included in parsed Word Document text
> ------------------------------------------------------
>
> Key: TIKA-3711
> URL: https://issues.apache.org/jira/browse/TIKA-3711
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 2.3.0
> Reporter: Sam Stephens
> Priority: Major
> Attachments: word-doc-with-image-from-word-365.docx,
> word-doc-with-image.docx
>
>
> The attached Word document includes nothing but a single image. Running it
> through the Tika 2.2.0 AutoDetectParser correctly returns null. Running it
> through the Tika 2.3.0 AutoDetectParser returns the text:
> {{image1.png}}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)