[ https://issues.apache.org/jira/browse/TIKA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950951#comment-17950951 ]
Tim Allison edited comment on TIKA-4411 at 5/12/25 12:52 PM: ------------------------------------------------------------- K. The xhtml issue appears to be a difference in how jsoup 1.18.3 and jsoup 1.19.1 handle broken xhtml. Note: the change in jsoup happened between 1.18.3 and 1.19.1 -- we're now using the latest version of jsoup: 1.20.1, which still has the 1.19.1 behavior. The publicly available example file is here: https://bug1554250.bmoattachments.org/attachment.cgi?id=9068831 If anyone wants to dig into this and open an issue on jsoup (if there's a problem?!), please go for it. I don't think this is a significant enough difference to warrant downgrading jsoup to 1.18.3. I'll start the 3.2.0 release process shortly. I'm happy to respin if anyone disagrees or would prefer a different solution. Or, of course, if you notice any other problems! Onwards! was (Author: talli...@mitre.org): K. The xhtml issue appears to be a difference in how jsoup 1.18.3 and jsoup 1.19.1 handle broken xhtml. Note: the change in jsoup happened between 1.18.3 and 1.19.1 -- we're now using the latest version of jsoup: 1.20.1, which still has the 1.19.1 behavior. The publicly available example file is here: https://bug1554250.bmoattachments.org/attachment.cgi?id=9068831 If anyone wants to dig into this and open an issue on jsoup (if there's a problem?!), please go for it. I don't think this is a significant enough difference to warrant downgrading jsoup to 1.18.3. I'll start the 3.2.0 release process shortly. I'm happy to respin if anyone disagrees or would prefer a different solution. Onwards! > Run the 3.2.0 release process > ----------------------------- > > Key: TIKA-4411 > URL: https://issues.apache.org/jira/browse/TIKA-4411 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Fix For: 3.2.0 > > Attachments: reports-3.2.0-pre-rc1.tgz, reports-3.2.0.tgz > > -- This message was sent by Atlassian Jira (v8.20.10#820010)