[ https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133832#comment-14133832 ]
Tim Allison commented on TIKA-1396: ----------------------------------- I just tested the tika 1.6 app jar on "testPDF_childAttachments.pdf," and it works on that file. To turn extraction of embedded images on with the app.jar, you have to unzip it, and change the value of extractInlineImages to true in this file: org/apache/tika/parsers/pdf/PDFParser.properties You also mentioned that you are trying to set the config value programmatically...a full working example is available in testEmbeddedFilesInChildren in [PDFParserTest|http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java] Two thoughts: 1) The pdf parser code and/or Tika's wrapper around it might be failing on your particular document(s). Are you able to share them? 2) Is your recursive handler/attachment handler working on attachments in other document formats? > Embedded images in PDF documents > -------------------------------- > > Key: TIKA-1396 > URL: https://issues.apache.org/jira/browse/TIKA-1396 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.5 > Environment: *OS:* > Ubuntu 14.04.1 LTS > *KERNEL:* > 3.13.0-33-generic > gcc version 4.8.2 > *JAVA:* > java version "1.8.0_11" > Java(TM) SE Runtime Environment (build 1.8.0_11-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode) > Reporter: Damiano > Priority: Critical > Fix For: 1.6 > > > Hello! > I just found a problem with PDF documents that have embedded images. > Doing: > java -jar tika-app-1.5.jar --extract tika.pdf > Tika can not find the image. > Is this a PDF related problem? Because if i do the same operation with a DOC > document Tika finds the image correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)