[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228913#comment-14228913 ]
Andreas Lehmkühler commented on TIKA-1442: ------------------------------------------ I've improved the self repair mechanism of the non-sequential parser and fixed some of the regressions introduced in 1.8.7 and the current 1.8-branch. [~talli...@apache.org] Do you have some time to run another test? I'd like to cut the 1.8.8 release soon and I'm eager to know if there are any new regressions due to my latest changes. The latest [build|https://builds.apache.org/job/PDFBox%201.8.x/347/] includes all changes, [156|http://repository.apache.org/content/repositories/snapshots/org/apache/pdfbox/pdfbox-app/1.8.8-SNAPSHOT/pdfbox-app-1.8.8-20141129.191713-156.jar] is the correct SNAPHOT version. It would be cool if you can run the 1.8.6 vs 1.8.8 and the classic vs non-sequential parser test. I guess 50k pdfs should be sufficient. Thanks in advance :-) > Upgrade to PDFBox 1.8.8 > ----------------------- > > Key: TIKA-1442 > URL: https://issues.apache.org/jira/browse/TIKA-1442 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Assignee: Tim Allison > Fix For: 1.8 > > Attachments: PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx, > PDFBox_1_8_6VPDFBox_1_8_8-b145.zip, > PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx, > pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx, > pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip > > > Given the regressions we identified in PDFBox 1.8.7, we should upgrade to > 1.8.8 as soon as it is ready. I'm tempted to call this a blocker on Tika > 1.7. Let's use this issue to carry on the discussion of regression testing > (if any further discussion is necessary) or any other prep that needs to > happen before 1.8.8's release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)