[ https://issues.apache.org/jira/browse/TIKA-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930457#comment-17930457 ]
ASF GitHub Bot commented on TIKA-4387: -------------------------------------- tballison merged PR #2143: URL: https://github.com/apache/tika/pull/2143 > Improve robustness of file extension parsing > -------------------------------------------- > > Key: TIKA-4387 > URL: https://issues.apache.org/jira/browse/TIKA-4387 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > {{FilenameUtils.getSuffixFromPath()}} isn't checking that the extension > contains only alphanumeric characters. > If a "file path" derives from an internal path in a pst, like so {{/Début du > fichier de données Outlook/[WEBINAR] - "Introducing Couchbase Server 2.5"}}, > then the extension is {{.5"}}, which causes problems on Windows. > The problem happens when TemporaryResources goes to write a temp file and > tries to maintain the file extension based on the {{resourceName}} in the > Metadata. > We should add a check that the extension contains only alphanumerics? Or > something? -- This message was sent by Atlassian Jira (v8.20.10#820010)