[ 
https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181779#comment-14181779
 ] 

Tilman Hausherr commented on TIKA-1442:
---------------------------------------

Thanks!

I'm slowly starting, and here's the first thing: 892/892848.pdf, this file is 
encrypted and has no text extract permission. But the line in the excel file 
does have tokens, which is, uh, surprising.

With the "old" parser, use this code, because files are sometimes encrypted 
with the empty password:
{code}
            if( document.isEncrypted() )
            {
                try
                {
                    StandardDecryptionMaterial sdm = new 
StandardDecryptionMaterial("");
                    document.openProtection(sdm);
                }
                catch( InvalidPasswordException e )
                {
                    System.err.println( "Error: The document is encrypted." );
                }
            }
{code}
The nonSeq parser does this automatically.

> Upgrade to PDFBox 1.8.8
> -----------------------
>
>                 Key: TIKA-1442
>                 URL: https://issues.apache.org/jira/browse/TIKA-1442
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.7
>
>         Attachments: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, 
> pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx
>
>
> Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 
> 1.8.8 as soon as it is ready.  I'm tempted to call this a blocker on Tika 
> 1.7.  Let's use this issue to carry on the discussion of regression testing 
> (if any further discussion is necessary) or any other prep that needs to 
> happen before 1.8.8's release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to