[jira] [Commented] (TIKA-1396) Embedded images in PDF documents

Tim Allison (JIRA) Mon, 15 Sep 2014 05:26:53 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133832#comment-14133832
 ]


Tim Allison commented on TIKA-1396:
-----------------------------------

I just tested the tika 1.6 app jar on "testPDF_childAttachments.pdf," and it 
works on that file. To turn extraction of embedded images on with the app.jar, 
you have to unzip it, and change the value of extractInlineImages to true in 
this file: org/apache/tika/parsers/pdf/PDFParser.properties

You also mentioned that you are trying to set the config value 
programmatically...a full working example is available in 
testEmbeddedFilesInChildren in 
[PDFParserTest|http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java]

Two thoughts:
1) The pdf parser code and/or Tika's wrapper around it might be failing on your 
particular document(s).  Are you able to share them?
2) Is your recursive handler/attachment handler working on attachments in other 
document formats?

> Embedded images in PDF documents
> --------------------------------
>
>                 Key: TIKA-1396
>                 URL: https://issues.apache.org/jira/browse/TIKA-1396
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>         Environment: *OS:* 
> Ubuntu 14.04.1 LTS
> *KERNEL:*
> 3.13.0-33-generic 
> gcc version 4.8.2
> *JAVA:*
> java version "1.8.0_11"
> Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
>            Reporter: Damiano
>            Priority: Critical
>             Fix For: 1.6
>
>
> Hello!
> I just found a problem with PDF documents that have embedded images.
> Doing:
> java -jar tika-app-1.5.jar --extract tika.pdf
> Tika can not find the image.
> Is this a PDF related problem? Because if i do the same operation with a DOC 
> document Tika finds the image correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1396) Embedded images in PDF documents

Reply via email to