[ 
https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283780#comment-16283780
 ] 

Tim Allison commented on TIKA-2483:
-----------------------------------

Regression tests in prep for 1.17 show that we need to add quite a few more 
specializations of zip and tar to check for and avoid overwriting. Lots of 
files that were identified as kmz, tika-ooxml, etc were now being identified as 
"zip" during the parse.

Current patch includes list semi-manually, which I abhor, but I added a test to 
make sure that PackageParser's list of specialization stays current with 
TikaConfig's default config. 

After 1.17 is released, we can either work towards getting rid of serialization 
of parsers in ForkParser and/or making TikaConfig serializable.  Until we do 
that, I don't see an elegant solution.

> Using PackageParser in ForkParser causes NPE
> --------------------------------------------
>
>                 Key: TIKA-2483
>                 URL: https://issues.apache.org/jira/browse/TIKA-2483
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.16
>            Reporter: TzeKai Lee
>         Attachments: testForkedPackageParsing.patch
>
>
> {quote}
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:158)
>         at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:78)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:242)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:379)
>         at 
> org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:165)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> {quote}
> The mediaTypeRegistry handling code in parse() of PackageParser seems cause 
> the problem due to ForkParser cannot properly construct default TikaConfig. 
> Also since TikaConfig is not serializable, there is no way to assign 
> mediaTypeRegistry/bufferedMediaTypeRegistry before calling parse()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to