Tim Allison created TIKA-1376:
---------------------------------

             Summary: Improve embedded file name extraction in PDFParser
                 Key: TIKA-1376
                 URL: https://issues.apache.org/jira/browse/TIKA-1376
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Tim Allison
            Assignee: Tim Allison
            Priority: Trivial
             Fix For: 1.6


When we extract embedded files from PDFs, we are currently using the key in the 
PDEmbeddedFilesNameTreeNode as the file name that we store as the value of 
Metadata.RESOURCE_NAME_KEY in the embedded document's  metadata.

I think we should try to get the file name from PDComplexFileSpecification's 
getFilename() first.  If that is null, then we should fall back to the key 
value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to