[ https://issues.apache.org/jira/browse/TIKA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4357: ------------------------------ Description: There are several places in the codebase where we are mindlessly trusting a file's metadata key without namespace prefixing. This is dangerous because user data could overwrite metadata from Tika or do other unpleasant things. There are other places where we were transitioning to namespace prefixes and left in the legacy keys without prefixes (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633). In 4.x, we should look through the codebase and ensure that we are prefixing custom metadata keys. A related idea is that rather than have format specific "custom:" prefixes, we use a general prefix for all file formats...WDYT? was: There are several places in the codebase where we are mindlessly trusting a file's metadata key without namespace prefixing. There are other places where we were transitioning to namespace prefixes and left in the legacy keys without prefixes (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633). In 4.x, we should look through the codebase and ensure that we are prefixing custom metadata keys. > Ensure namespace prefixes in metadata keys in 4.x > ------------------------------------------------- > > Key: TIKA-4357 > URL: https://issues.apache.org/jira/browse/TIKA-4357 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > There are several places in the codebase where we are mindlessly trusting a > file's metadata key without namespace prefixing. This is dangerous because > user data could overwrite metadata from Tika or do other unpleasant things. > There are other places where we were transitioning to namespace prefixes and > left in the legacy keys without prefixes > (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633). > > In 4.x, we should look through the codebase and ensure that we are prefixing > custom metadata keys. > A related idea is that rather than have format specific "custom:" prefixes, > we use a general prefix for all file formats...WDYT? -- This message was sent by Atlassian Jira (v8.20.10#820010)