[ https://issues.apache.org/jira/browse/TIKA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4357: ------------------------------ Labels: 4x (was: ) > Ensure namespace prefixes in metadata keys in 4.x > ------------------------------------------------- > > Key: TIKA-4357 > URL: https://issues.apache.org/jira/browse/TIKA-4357 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Labels: 4x > > There are several places in the codebase where we are mindlessly trusting a > file's metadata key without namespace prefixing. This is dangerous because > user data could overwrite metadata from Tika or do other unpleasant things. > There are other places where we were transitioning to namespace prefixes and > left in the legacy keys without prefixes > (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633). > > In 4.x, we should look through the codebase and ensure that we are prefixing > custom metadata keys. > A related idea is that rather than have format specific "custom:" prefixes, > we use a general prefix for all file formats...WDYT? For those parsers where > we want to distinguish the raw source of the information -- I'm looking at > you pdf docinfo and pdf xmp! -- we could use two keys. -- This message was sent by Atlassian Jira (v8.20.10#820010)