[
https://issues.apache.org/jira/browse/TIKA-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056665#comment-18056665
]
Tim Allison edited comment on TIKA-4650 at 2/5/26 1:46 PM:
-----------------------------------------------------------
And, y, waves hands at microbenchmarks. Lots of areas for improvements, but I
think this is enough information to stop worrying that the new features are a
problem. And this is enough info to improve 3.x by reusing the zipfile that was
opened during detection (:face-palm\:).
was (Author: [email protected]):
And, y, waves hands at microbenchmarks. Lots of areas for improvements, but I
think this is enough information to stop worrying that the new features are a
problem. And this is enough info to improve 3.x by reusing the zipfile that was
opened during detection (:face-palm:).
> Improve zip parsing in 4.x
> --------------------------
>
> Key: TIKA-4650
> URL: https://issues.apache.org/jira/browse/TIKA-4650
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> Zip parsing has a number of quirks that require special processing. Over time
> those have accreted in the PackageParser. Further, there's not great
> coordination between the zip detector and the zip parser...there are some
> areas where we could streamline the detect+parse steps.
> Let's create a standalone zip parser and improve the coordination between
> detection and parsing for zip files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)