[ 
https://issues.apache.org/jira/browse/TIKA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950943#comment-17950943
 ] 

Tim Allison commented on TIKA-4411:
-----------------------------------

Two issues above are now fixed. We're getting 3 new ooxml exceptions and a few 
ooxml->zip changes in detection, but I _think_ these are improvements.

The new tika-eval fixes appear to work. The reports now include the internal 
path for attachments, and there are no complaints when opening xlsx in 
LibreOffice.

I'm now noticing a regression in some xhtml files (such as 
{{bug_trackers/MOZILLA/1534195-1623599/MOZILLA-1554250-6.xhtml}}. For some of 
these files, there are fewer "common tokens", or, if there are more, they are 
xhtml tags. The signals for this regression were in the earlier run...I just 
didn't notice them. :(

I'll look into these today.

> Run the 3.2.0 release process
> -----------------------------
>
>                 Key: TIKA-4411
>                 URL: https://issues.apache.org/jira/browse/TIKA-4411
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 3.2.0
>
>         Attachments: reports-3.2.0-pre-rc1.tgz, reports-3.2.0.tgz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to