[ 
https://issues.apache.org/jira/browse/TIKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aeham Abushwashi updated TIKA-2300:
-----------------------------------
    Attachment: TIKA-2300.patch

Here's a first stab at a patch for discussion....
PackageParser can easily figure out if the zip is encrypted (albeit with an 
ugly cast!). I figured users may not always want the PackageParser to abandon 
processing encrypted zip files and opted for adding a metadata flag to indicate 
the file is encrypted. This maintains backwards compatibility with TIKA-1028, 
but is it consistent with how Tika reports _partial_ success/failure elsewhere?
Also... the change made me realise the rich metadata extracted by the 
PackageParser for the compressed/inner files never finds its way back up to 
users through the metadata object. Is this by design?

> Can't tell if a zip file is encrypted
> -------------------------------------
>
>                 Key: TIKA-2300
>                 URL: https://issues.apache.org/jira/browse/TIKA-2300
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.14
>            Reporter: Aeham Abushwashi
>            Assignee: Tim Allison
>         Attachments: encrypted_file.zip, TIKA-2300.patch
>
>
> When Tika processes a zip file that is protected with a password, it will 
> return the list of file names within the zip but no indication (as an 
> exception or in metadata) that the file is encrypted. 
> From stepping through the code, I can see that the information needed to 
> determine whether the archive is encrypted is available inside 
> ZipArchiveEntry#getGeneralPurposeBit#usesEncryption, but needs to be relayed 
> back to PackageParser somehow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to