[ https://issues.apache.org/jira/browse/TIKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tetiana Tvardovska updated TIKA-3590: ------------------------------------- Component/s: detector > OSX DMG files wrong MIME type detection (wrong MediaType and Supertype) > ----------------------------------------------------------------------- > > Key: TIKA-3590 > URL: https://issues.apache.org/jira/browse/TIKA-3590 > Project: Tika > Issue Type: Bug > Components: core, detector > Affects Versions: 1.26, 1.27, 2.0.0-ALPHA, 2.0.0-BETA, 2.1.0 > Reporter: Tetiana Tvardovska > Priority: Major > > Calling {{mimeSupport.detectMimeTypes}} for OSX DMG files returns a wrong > value. > DMG files are detected as MIME type: {{*"application/zlib"*}} or > *{{"application/x-bzip"}}* > instead of expected: *{{"application/x-apple-diskimage".}}* > > Error is caused by {{getSupertype}} method which returns a wrong type (too > "super" {{{}MediaType.OCTET_STREAM){}}}for OSX DMG files instead of > {{{}*"application/zlib" or* {*}"application/x-bzip"{*}{*}{*}{}}}. > > For information, DMG mime type is correctly detected when debugging the > method > > {code:java} > org/apache/tika/mime/MimeTypes.java:484 public MediaType detect(... > 522: MimeType hint = getMimeType(name); > {code} > the {{hint}} value gets a correct *{{"application/x-apple-diskimage"}}* > value here. > But later the {{hint}} value is not taken into consideration for > {{possibleTypes}} as {{applyHint}} results: > > {code:java} > 529: possibleTypes = applyHint(possibleTypes, hint);{code} > > This wrong value is returned to : > > {code:java} > repository/org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar!/org/apache/tika/detect/CompositeDetector.java:84 > MediaType detected = detector.detect(input, metadata); > if (registry.isSpecializationOf(detected, type)) { > type = detected; > } > {code} > > > h3. Possible solution -Add a more precise Supertype detection for > "{{{}*application/x-apple-diskimage*{}}}" type > Just add one more verification into the > {{{}MediaTypeRegistry.{}}}{{getSupertype}} method, for example, in a > 'diff'-like format: > {{org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar}} > {{org/apache/tika/mime/MediaTypeRegistry.java:187}} > > {code:java} > public MediaType getSupertype(MediaType type) { > ... > + } else if (type.getSubtype().endsWith("x-apple-diskimage")) { > + return MediaType.application("x-bzip"); > + } > ... > } > {code} > > or > {code:java} > public MediaType getSupertype(MediaType type) { > ... > + } else if (type.getSubtype().endsWith("x-apple-diskimage")) { > + return MediaType.APPLICATION_ZIP; > + } > ... > } > {code} > > > --- > Tested at project [Sonatype Nexus|https://github.com/sonatype/nexus-public/] > {{release-3.36.0-01 }}for RAW repository with a "Strict Content Type > Validation" set ON when trying to upload *.dmg files. -- This message was sent by Atlassian Jira (v8.20.1#820001)