[ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924075#comment-17924075 ]
Tim Allison commented on TIKA-1180: ----------------------------------- A PR would be helpful. > Better Matroska MKV and WEBM Detection > -------------------------------------- > > Key: TIKA-1180 > URL: https://issues.apache.org/jira/browse/TIKA-1180 > Project: Tika > Issue Type: New Feature > Components: detector > Affects Versions: 1.5 > Reporter: Nick Burch > Priority: Major > Labels: new-parser > Attachments: sample-mkv.noext, sample-webm.noext > > > Following the work on TIKA-1177, we now have mimetype entries for the various > formats which are based on the Matroska container (mkv, mka, webm etc). > However, we are unable to properly identify the specific type just from some > mime magic > Instead, for fully accurate detection, we'll need a new Detector for the > Matroska family, which does some very simple container/stream processing to > work out what the container contains -- This message was sent by Atlassian Jira (v8.20.10#820010)