[ https://issues.apache.org/jira/browse/TIKA-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4387: ------------------------------ Description: {{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension contains only alphanumeric characters. If a "file path" derives from an internal path in a pst, like so {{/Début du fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}}, then the extension is {{.5"}}, which causes problems on Windows. The problem happens when TemporaryResources goes to write a temp file and tries to maintain the file extension based on the {{resourceName}} in the Metadata. We should add a check that the extension contains only alphanumerics? Or something? was: {{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension contains only alphanumeric characters. If a "file path" derives from an internal path in a pst, like so {{/Début du fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}}, then the extension is {{.5"}}, which causes problems on Windows. > Improve robustness of file extension parsing > -------------------------------------------- > > Key: TIKA-4387 > URL: https://issues.apache.org/jira/browse/TIKA-4387 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > {{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension > contains only alphanumeric characters. > If a "file path" derives from an internal path in a pst, like so {{/Début du > fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}}, > then the extension is {{.5"}}, which causes problems on Windows. > The problem happens when TemporaryResources goes to write a temp file and > tries to maintain the file extension based on the {{resourceName}} in the > Metadata. > We should add a check that the extension contains only alphanumerics? Or > something? -- This message was sent by Atlassian Jira (v8.20.10#820010)