[ https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922453#comment-17922453 ]
Tim Allison commented on TIKA-4373: ----------------------------------- jsoup 1.18.3 stops parsing around the following probably because of a different handling of the < > (as you pointed out Tilman, bad html). bq. Year of origination: 1990; LTV>= 97: 4.97%; 95<=LTV< 97: 3.50%; 90<=LTV<95: 2.69%; 0<LTV<90: 1.79%. > Regression tests for 3.1.0 release > ---------------------------------- > > Key: TIKA-4373 > URL: https://issues.apache.org/jira/browse/TIKA-4373 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Major > Fix For: 3.1.0 > > Attachments: 032223-jsoup-1.18.1.html, 032223-jsoup-1.18.3.html, > S53SZFZ2FBOZIVTX3HVP4D4XKHKPEMQQ.csv, filter_md5_suc_url.json, > reports-tika-3.0.0-v-3.1.0-rc1.tgz, reports_tika-3.0-vs-3.1.tgz > > -- This message was sent by Atlassian Jira (v8.20.10#820010)