[ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924145#comment-17924145 ]
Subbu edited comment on TIKA-1180 at 2/5/25 3:46 PM: ----------------------------------------------------- [~tallison] Sure thing. I can help with PR as above. But just want to check with OP of fix to respect the efforts here if they are planning for a PR. was (Author: JIRAUSER307746): [~tallison] Sure thing. I can help with PR as above. But just want to check with OP of fix to respect the efforts here. Let me know if it's okay to create a PR since it's been like couple of years in this thread. > Better Matroska MKV and WEBM Detection > -------------------------------------- > > Key: TIKA-1180 > URL: https://issues.apache.org/jira/browse/TIKA-1180 > Project: Tika > Issue Type: New Feature > Components: detector > Affects Versions: 1.5 > Reporter: Nick Burch > Priority: Major > Labels: new-parser > Attachments: sample-mkv.noext, sample-webm.noext > > > Following the work on TIKA-1177, we now have mimetype entries for the various > formats which are based on the Matroska container (mkv, mka, webm etc). > However, we are unable to properly identify the specific type just from some > mime magic > Instead, for fully accurate detection, we'll need a new Detector for the > Matroska family, which does some very simple container/stream processing to > work out what the container contains -- This message was sent by Atlassian Jira (v8.20.10#820010)