[ 
https://issues.apache.org/jira/browse/TIKA-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085362#comment-18085362
 ] 

Tim Allison edited comment on TIKA-4730 at 6/2/26 12:41 AM:
------------------------------------------------------------

I turned on the strict validator on the 4.x run. This throws exceptions for 
unbalanced xhtml. All of the "new" exceptions are just identifying where there 
was some kind of parse exception before or silently bad xhtml in 3.x. This is 
turned off in production.

This includes the new charset detector, which is generally doing a lot better, 
but does have some problems. I'll try to quantify that.

"new" PDF exceptions are because I ran with strict access permission setting. 
548/553 are not a problem.

pack200 is still a problem.

This was run against 3.3.1, not against 4.0.0-alpha-1

 


was (Author: [email protected]):
I turned on the strict validator on the 4.x run. This throws exceptions for 
unbalanced xhtml. All of the "new" exceptions are just identifying where there 
was some kind of parse exception before or silently bad xhtml.

This includes the new charset detector, which is generally doing a lot better, 
but does have some problems. I'll try to quantify that.

pack200 is still a problem.

This was run against 3.3.1, not against 4.0.0-alpha-1

 

> Prep for 4.0.0-beta-1 release
> -----------------------------
>
>                 Key: TIKA-4730
>                 URL: https://issues.apache.org/jira/browse/TIKA-4730
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: reports.tar.gz
>
>
> We made a number of important fixes to the published artifacts in ASF's dist 
> repo, maven central and docker.
> I think we're set on changing APIs for 4.x generally.
> Is there anything else we need for this beta release?
> I propose starting the 4.0.0-beta-1 release in two weeks. WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to