[ 
https://issues.apache.org/jira/browse/TIKA-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077835#comment-18077835
 ] 

Tim Allison commented on TIKA-4683:
-----------------------------------

Nice catch, [~tilman] ! PR opened to fix that.

Also a fix for digesting OLE, which led to changes in ole detection, zero-byte 
exceptions and different numbers of attachments – one bug, three signals.

Encoding changes are in the noise and intentional, based on mappings we 
borrowed from standardhtml encoding detector.

Content diffs in gz etc are expected because we aren't dumping attachment names 
by default into content stream any more.

Rerunning shortly... Getting closer.

> Prep for 4.0.0-ALPHA release
> ----------------------------
>
>                 Key: TIKA-4683
>                 URL: https://issues.apache.org/jira/browse/TIKA-4683
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: reports-20260429.tar.gz, reports-20260502.tar.gz, 
> reports-4.0.0-20260411.tgz, reports.tar.gz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to