[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519802#comment-17519802
]
Hudson commented on TIKA-3711:
------------------------------
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #512 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/512/])
TIKA-3711 -- allow configuration of EmbeddedDocumentExtractors via
tika-config.xml (tallison:
[https://github.com/apache/tika/commit/ccc7bd841e097c3aa6d0c7c8494ddc5fa7596619])
* (edit)
tika-core/src/main/java/org/apache/tika/extractor/ParsingEmbeddedDocumentExtractor.java
* (edit)
tika-core/src/main/java/org/apache/tika/parser/AutoDetectParserConfig.java
* (add)
tika-core/src/main/java/org/apache/tika/extractor/ParsingEmbeddedDocumentExtractorFactory.java
* (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
* (edit)
tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentUtil.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* (add)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-with-names.xml
* (add)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-no-names.xml
* (add)
tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentExtractorFactory.java
TIKA-3711 -- allow configuration of EmbeddedDocumentExtractors via
tika-config.xml -- review and correct places where outputHtml should be false.
(tallison:
[https://github.com/apache/tika/commit/6552b076f0b4987423710b72b8917150422ea112])
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParser.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/pkg/ZipParserTest.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNoteTreeWalker.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/AbstractPOIFSExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/XML2003ParserTest.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/ImageGraphicsEngine.java
* (edit) CHANGES.txt
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
> Image file names included in parsed Word Document text
> ------------------------------------------------------
>
> Key: TIKA-3711
> URL: https://issues.apache.org/jira/browse/TIKA-3711
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.3.0
> Reporter: Sam Stephens
> Priority: Major
> Fix For: 2.4.0
>
> Attachments: word-doc-with-image-from-word-365.docx,
> word-doc-with-image.docx
>
>
> The attached Word document includes nothing but a single image. Running it
> through the Tika 2.2.0 AutoDetectParser correctly returns null. Running it
> through the Tika 2.3.0 AutoDetectParser returns the text:
> {{image1.png}}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)