[ https://issues.apache.org/jira/browse/TIKA-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003745#comment-18003745 ]
Tim Allison commented on TIKA-4449: ----------------------------------- [~peterhoogendijk] sounds good. I just merged this into main and cherry-picked into 3x. The next snapshot build for both should have it. Please reopen this issue if I didn't fix it fully/correctly. Thank you. > Improve xmp metadata key precision for PDFs > ------------------------------------------- > > Key: TIKA-4449 > URL: https://issues.apache.org/jira/browse/TIKA-4449 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > PDFs (and other file formats) may have conflicting information within them > about, for example, the "title" field or the "author" field. > Tika's parsers typically pick one source over another and normalize the keys > to dublin core or other standards. > [~peterhoogendijk] and other users (likely?) want to be able to identify > whether a given piece of information comes from the XMP or the docinfo. This > is follow on work from TIKA-4444. The proposal is to add new metadata keys to > specify when dublin core information comes directly from xmp. -- This message was sent by Atlassian Jira (v8.20.10#820010)