[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894501#action_12894501 ]
Alex Ott commented on TIKA-447: ------------------------------- 2Nick: does this will allow to implement support for self-extracted archives? Because, if we'll implement this as separate checker, then we'll need to implement archive extraction/detection inside this checker - this could lead to code duplication. > Container aware mimetype detection > ---------------------------------- > > Key: TIKA-447 > URL: https://issues.apache.org/jira/browse/TIKA-447 > Project: Tika > Issue Type: New Feature > Components: mime > Affects Versions: 0.7 > Reporter: Nick Burch > Attachments: TikaContainerDetection.patch > > > As discussed on the dev list, Tika should ideally have a configurable way to > process container based formats (eg zip files and ole2 files) when trying to > detect the correct mime type for a document. > This needs to be configurable, because some people won't want Tika to have to > do all the work of parsing the whole file when they're not interested in > knowing exactly what's in it > Once we have gone to the trouble of opening and parsing the container file, > we should try to keep the open container around to speed up parsing of the > contents -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.