[jira] [Commented] (TIKA-4449) Improve xmp metadata key precision for PDFs

Tim Allison (Jira) Tue, 08 Jul 2025 04:57:06 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003745#comment-18003745
 ]


Tim Allison commented on TIKA-4449:
-----------------------------------

[~peterhoogendijk] sounds good. I just merged this into main and cherry-picked 
into 3x. The next snapshot build for both should have it. Please reopen this 
issue if I didn't fix it fully/correctly. Thank you.

> Improve xmp metadata key precision for PDFs
> -------------------------------------------
>
>                 Key: TIKA-4449
>                 URL: https://issues.apache.org/jira/browse/TIKA-4449
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> PDFs (and other file formats) may have conflicting information within them 
> about, for example, the "title" field or the "author" field.
> Tika's parsers typically pick one source over another and normalize the keys 
> to dublin core or other standards.
> [~peterhoogendijk] and other users (likely?) want to be able to identify 
> whether a given piece of information comes from the XMP or the docinfo. This 
> is follow on work from TIKA-4444. The proposal is to add new metadata keys to 
> specify when dublin core information comes directly from xmp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TIKA-4449) Improve xmp metadata key precision for PDFs

Reply via email to