[
https://issues.apache.org/jira/browse/TIKA-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077835#comment-18077835
]
Tim Allison commented on TIKA-4683:
-----------------------------------
Nice catch, [~tilman] ! PR opened to fix that.
Also a fix for digesting OLE, which led to changes in ole detection, zero-byte
exceptions and different numbers of attachments – one bug, three signals.
Encoding changes are in the noise and intentional, based on mappings we
borrowed from standardhtml encoding detector.
Content diffs in gz etc are expected because we aren't dumping attachment names
by default into content stream any more.
Rerunning shortly... Getting closer.
> Prep for 4.0.0-ALPHA release
> ----------------------------
>
> Key: TIKA-4683
> URL: https://issues.apache.org/jira/browse/TIKA-4683
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: reports-20260429.tar.gz, reports-20260502.tar.gz,
> reports-4.0.0-20260411.tgz, reports.tar.gz
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)