[ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230835#comment-14230835 ]
Tim Allison commented on TIKA-1442: ----------------------------------- 057084.pdf was one of the files with JSON errors. I dug into that a bit. I kicked off the process, didn't think it went anywhere and killed it. It turns out that the child processes had already opened streams to a handful of files. Next time I reran the process, those 0-byte files were skipped, leading to the JSON error. 301125 is bizarre. When I re-built the Tika-app, I didn't do a full build of all of the modules, and the app component didn't pick up the changed dependency in the parsers pom.xml. I can't figure out, though, why 1.8.8 was empty. I'm redoing these runs. Sorry about that! > Upgrade to PDFBox 1.8.8 > ----------------------- > > Key: TIKA-1442 > URL: https://issues.apache.org/jira/browse/TIKA-1442 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Assignee: Tim Allison > Fix For: 1.8 > > Attachments: PDFBox_1_8_6DVPDFBox_1_8_8-TRAD-b156.xlsx, > PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx, PDFBox_1_8_6VPDFBox_1_8_8-b145.zip, > PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx, > PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx, > PDFBox_1_8_8-TRADVPDFBox_1_8_8-NONSEQ-b156.xlsx, > pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx, > pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip > > > Given the regressions we identified in PDFBox 1.8.7, we should upgrade to > 1.8.8 as soon as it is ready. I'm tempted to call this a blocker on Tika > 1.7. Let's use this issue to carry on the discussion of regression testing > (if any further discussion is necessary) or any other prep that needs to > happen before 1.8.8's release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)