[ https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158223#comment-14158223 ]
Tim Allison commented on TIKA-1427: ----------------------------------- On at least one test doc, I'm getting correct behavior: {noformat} <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="meta:creation-date" content="2002-12-31T14:13:29Z" /> ... </head> <body><div class="page"><p /> <p> </p> <p>What is a generic drug?</p> ... <p>generic drugs. </p> <p /> <img src="embedded:image0.png" alt="image0.png" /><img src="embedded:image1.png" alt="image1.png" /><img src="embedded:image2.png" alt="image2.png" /></div> <div class="page"><p /> ... <p>Generic Drugs: Safe. Effective. FDA Approved.</p> <p /> <img src="embedded:image3.png" alt="image3.png" /><img src="embedded:image4.png" alt="image4.png" /></div> <ul> <li>Local Disk</li> <ul> <li>Generic Drugs</li> </ul> </ul> </body></html> <noformat> Can you attach an example of a file that is failing? > PDF Images don't appear in structured view > ------------------------------------------ > > Key: TIKA-1427 > URL: https://issues.apache.org/jira/browse/TIKA-1427 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.6 > Reporter: James Baker > Assignee: Tim Allison > Labels: pdf > > When viewing, say, a Word Document, any images appear in the 'structured > view' of the document as <img> tags. The same is not true of PDF documents, > and we lose both the fact that there is an image present, and where it is in > the document. > Some discussion of this issue in the comments of TIKA-1396. -- This message was sent by Atlassian JIRA (v6.3.4#6332)