[ https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated TIKA-1419: ---------------------------------- Attachment: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.xlsx Here's an excel file, on the new column on the right I wrote which files improved by solving the three related PDFBox issues above. I mostly tested the files that had less tokens. I tested a few that had more tokens, there the results are inconclusive. Some have improved, some had more tokens due to a regression that has been solved now. Would it be possible, the next time, to test with the same set of files, and test not 1.8.8 against 1.8.7, but rather 1.8.8 against 1.8.6? The reason is that if there's an unknown regression in 1.8.7, and this isn't solved, 1.8.8 would look as if there's the same quality, but it is not. > Upgrade to PDFBox 1.8.7 > ----------------------- > > Key: TIKA-1419 > URL: https://issues.apache.org/jira/browse/TIKA-1419 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Minor > Attachments: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.csv, > compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.xlsx > > > Will run against govdocs1 early next week and then upgrade if no major > regressions are found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)