[ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283780#comment-16283780 ]
Tim Allison edited comment on TIKA-2483 at 12/8/17 4:38 PM: ------------------------------------------------------------ Regression tests in prep for 1.17 found a blocker caused by an incomplete fix to this issue. We need to add quite a few more specializations of zip and tar to check for and avoid overwriting of mime types to zip. Lots of files that were identified as kmz, tika-ooxml, etc in 1.16 are now being identified as "zip" during the parse in 1.17-SNAPSHOT. Current patch includes list semi-manually, which I abhor, but I added a test to make sure that PackageParser's list of specialization stays current with TikaConfig's default config. After 1.17 is released, we can either work towards getting rid of serialization of parsers in ForkParser and/or making TikaConfig serializable. Until we do that, I don't see an elegant solution. was (Author: talli...@mitre.org): Regression tests in prep for 1.17 found a blocker caused by a botched fix to this issue. We need to add quite a few more specializations of zip and tar to check for and avoid overwriting of mime types to zip. Lots of files that were identified as kmz, tika-ooxml, etc in 1.16 are now being identified as "zip" during the parse in 1.17-SNAPSHOT. Current patch includes list semi-manually, which I abhor, but I added a test to make sure that PackageParser's list of specialization stays current with TikaConfig's default config. After 1.17 is released, we can either work towards getting rid of serialization of parsers in ForkParser and/or making TikaConfig serializable. Until we do that, I don't see an elegant solution. > Using PackageParser in ForkParser causes NPE > -------------------------------------------- > > Key: TIKA-2483 > URL: https://issues.apache.org/jira/browse/TIKA-2483 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.16 > Reporter: TzeKai Lee > Attachments: testForkedPackageParsing.patch > > > {quote} > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:158) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:78) > at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:242) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:379) > at > org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:165) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > {quote} > The mediaTypeRegistry handling code in parse() of PackageParser seems cause > the problem due to ForkParser cannot properly construct default TikaConfig. > Also since TikaConfig is not serializable, there is no way to assign > mediaTypeRegistry/bufferedMediaTypeRegistry before calling parse() -- This message was sent by Atlassian JIRA (v6.4.14#64029)