[ 
https://issues.apache.org/jira/browse/TIKA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17922453#comment-17922453
 ] 

Tim Allison commented on TIKA-4373:
-----------------------------------

jsoup 1.18.3 stops parsing around the following probably because of a different 
handling of the < > (as you pointed out Tilman, bad html).

bq. Year of origination: 1990; LTV>= 97: 4.97%; 95<=LTV< 97: 3.50%; 90<=LTV<95: 
2.69%; 0<LTV<90: 1.79%.

> Regression tests for 3.1.0 release
> ----------------------------------
>
>                 Key: TIKA-4373
>                 URL: https://issues.apache.org/jira/browse/TIKA-4373
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 3.1.0
>
>         Attachments: 032223-jsoup-1.18.1.html, 032223-jsoup-1.18.3.html, 
> S53SZFZ2FBOZIVTX3HVP4D4XKHKPEMQQ.csv, filter_md5_suc_url.json, 
> reports-tika-3.0.0-v-3.1.0-rc1.tgz, reports_tika-3.0-vs-3.1.tgz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to