[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

Tilman Hausherr (JIRA) Tue, 25 Nov 2014 10:59:47 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225008#comment-14225008
 ]


Tilman Hausherr commented on TIKA-1442:
---------------------------------------

Thanks Tim!

892848.pdf and 892859.pdf should return nothing because they have no extract 
permission, yet they have 1000s of tokens in the table? PDFBox ExtractText 
brings an IOException that there is no text extraction permission.

357567.pdf, 267739.pdf and 686183.pdf are unfixed regressions PDFBOX-2421 and 
PDFBOX-2449.

Not PDFs:
196/196578.pdf
371/371231.pdf
879/879483.pdf
892/892042.pdf

890238.pdf is a regression, but only with the old parser. 
(IllegalBlockSizeException). I think this one was mentioned elsewhere.

474863.pdf is also a regression (IllegalBlockSizeException), with both parsers.

more to come...





> Upgrade to PDFBox 1.8.8
> -----------------------
>
>                 Key: TIKA-1442
>                 URL: https://issues.apache.org/jira/browse/TIKA-1442
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.8
>
>         Attachments: PDFBox_1_8_6VPDFBox_1_8_8-b145.xlsx, 
> PDFBox_1_8_8-ClassicVPDFBox_1_8_8-NonSeq.xlsx, 
> pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx, 
> pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
>
>
> Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 
> 1.8.8 as soon as it is ready.  I'm tempted to call this a blocker on Tika 
> 1.7.  Let's use this issue to carry on the discussion of regression testing 
> (if any further discussion is necessary) or any other prep that needs to 
> happen before 1.8.8's release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1442) Upgrade to PDFBox 1.8.8

Reply via email to