[ 
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230835#comment-14230835
 ] 

Tim Allison commented on TIKA-1442:
-----------------------------------

057084.pdf was one of the files with JSON errors.  I dug into that a bit.  I 
kicked off the process, didn't think it went anywhere and killed it.  It turns 
out that the child processes had already opened streams to a handful of files.  
Next time I reran the process, those 0-byte files were skipped, leading to the 
JSON error.

301125 is bizarre.  When I re-built the Tika-app, I didn't do a full build of 
all of the modules, and the app component didn't pick up the changed dependency 
in the parsers pom.xml.  I can't figure out, though, why 1.8.8 was empty.

I'm redoing these runs.  Sorry about that!



> Upgrade to PDFBox 1.8.8
> -----------------------
>
>                 Key: TIKA-1442
>                 URL: https://issues.apache.org/jira/browse/TIKA-1442
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.8
>
>         Attachments: PDFBox_1_8_6DVPDFBox_1_8_8-TRAD-b156.xlsx, 
> PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx, PDFBox_1_8_6VPDFBox_1_8_8-b145.zip, 
> PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx, 
> PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx, 
> PDFBox_1_8_8-TRADVPDFBox_1_8_8-NONSEQ-b156.xlsx, 
> pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx, 
> pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
>
>
> Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 
> 1.8.8 as soon as it is ready.  I'm tempted to call this a blocker on Tika 
> 1.7.  Let's use this issue to carry on the discussion of regression testing 
> (if any further discussion is necessary) or any other prep that needs to 
> happen before 1.8.8's release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to