[ 
https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283780#comment-16283780
 ] 

Tim Allison edited comment on TIKA-2483 at 12/8/17 4:38 PM:
------------------------------------------------------------

Regression tests in prep for 1.17 found a blocker caused by an incomplete fix 
to this issue. We need to add quite a few more specializations of zip and tar 
to check for and avoid overwriting of mime types to zip. Lots of files that 
were identified as kmz, tika-ooxml, etc in 1.16 are now being identified as 
"zip" during the parse in 1.17-SNAPSHOT.

Current patch includes list semi-manually, which I abhor, but I added a test to 
make sure that PackageParser's list of specialization stays current with 
TikaConfig's default config. 

After 1.17 is released, we can either work towards getting rid of serialization 
of parsers in ForkParser and/or making TikaConfig serializable.  Until we do 
that, I don't see an elegant solution.


was (Author: talli...@mitre.org):
Regression tests in prep for 1.17 found a blocker caused by a botched fix to 
this issue. We need to add quite a few more specializations of zip and tar to 
check for and avoid overwriting of mime types to zip. Lots of files that were 
identified as kmz, tika-ooxml, etc in 1.16 are now being identified as "zip" 
during the parse in 1.17-SNAPSHOT.

Current patch includes list semi-manually, which I abhor, but I added a test to 
make sure that PackageParser's list of specialization stays current with 
TikaConfig's default config. 

After 1.17 is released, we can either work towards getting rid of serialization 
of parsers in ForkParser and/or making TikaConfig serializable.  Until we do 
that, I don't see an elegant solution.

> Using PackageParser in ForkParser causes NPE
> --------------------------------------------
>
>                 Key: TIKA-2483
>                 URL: https://issues.apache.org/jira/browse/TIKA-2483
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.16
>            Reporter: TzeKai Lee
>         Attachments: testForkedPackageParsing.patch
>
>
> {quote}
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:158)
>         at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:78)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:242)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:379)
>         at 
> org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:165)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> {quote}
> The mediaTypeRegistry handling code in parse() of PackageParser seems cause 
> the problem due to ForkParser cannot properly construct default TikaConfig. 
> Also since TikaConfig is not serializable, there is no way to assign 
> mediaTypeRegistry/bufferedMediaTypeRegistry before calling parse()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to