[ 
https://issues.apache.org/jira/browse/TIKA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900052#comment-17900052
 ] 

Tim Allison commented on TIKA-4357:
-----------------------------------

I just updated the parsers for pdf and html. I need to review the PDFParser in 
more detail, and then we can turn to the other parsers (RTF? mail?).

> Ensure namespace prefixes in metadata keys in 4.x
> -------------------------------------------------
>
>                 Key: TIKA-4357
>                 URL: https://issues.apache.org/jira/browse/TIKA-4357
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> There are several places in the codebase where we are mindlessly trusting a 
> file's metadata key without namespace prefixing. There are other places where 
> we were transitioning to namespace prefixes and left in the legacy keys 
> without prefixes 
> (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633).
>  
> In 4.x, we should look through the codebase and ensure that we are prefixing 
> custom metadata keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to