[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225988#comment-14225988 ]
Nick Burch commented on TIKA-1489: ---------------------------------- I would consider this a feature rather than a bug! Generally though, if you give Tika a valid password, or the default password works, then Tika will extract the text. If you can't give the right password, then Tika won't extract the text. In this case, it seems that the default password works, so Tika uses it You need to use a PasswordProvider - https://tika.apache.org/1.6/api/org/apache/tika/parser/PasswordProvider.html - to supply non-standard passwords for protected / encrypted documents > PDF Text extraction without permission > -------------------------------------- > > Key: TIKA-1489 > URL: https://issues.apache.org/jira/browse/TIKA-1489 > Project: Tika > Issue Type: Bug > Affects Versions: 1.7 > Reporter: Tilman Hausherr > > In TIKA-1442 text extraction from files like 717226.pdf that don't have text > extraction permission works. The permissions in PDF files are only enforced > by the application (i.e. PDFBox), i.e. the text information isn't stored > separately in encrypted form. > PDFBox ExtractText command line does throw an exception. > So I wonder why TIKA is able to extract text. Either TIKA or the PDFBox call > used bypasses the permission checking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)