[ 
https://issues.apache.org/jira/browse/TIKA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-4357:
------------------------------
    Description: 
There are several places in the codebase where we are mindlessly trusting a 
file's metadata key without namespace prefixing. This is dangerous because user 
data could overwrite metadata from Tika or do other unpleasant things.

There are other places where we were transitioning to namespace prefixes and 
left in the legacy keys without prefixes 
(https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633).
 

In 4.x, we should look through the codebase and ensure that we are prefixing 
custom metadata keys.

A related idea is that rather than have format specific "custom:" prefixes, we 
use a general prefix for all file formats...WDYT?

  was:
There are several places in the codebase where we are mindlessly trusting a 
file's metadata key without namespace prefixing. There are other places where 
we were transitioning to namespace prefixes and left in the legacy keys without 
prefixes 
(https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633).
 

In 4.x, we should look through the codebase and ensure that we are prefixing 
custom metadata keys.


> Ensure namespace prefixes in metadata keys in 4.x
> -------------------------------------------------
>
>                 Key: TIKA-4357
>                 URL: https://issues.apache.org/jira/browse/TIKA-4357
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> There are several places in the codebase where we are mindlessly trusting a 
> file's metadata key without namespace prefixing. This is dangerous because 
> user data could overwrite metadata from Tika or do other unpleasant things.
> There are other places where we were transitioning to namespace prefixes and 
> left in the legacy keys without prefixes 
> (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633).
>  
> In 4.x, we should look through the codebase and ensure that we are prefixing 
> custom metadata keys.
> A related idea is that rather than have format specific "custom:" prefixes, 
> we use a general prefix for all file formats...WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to