Tilman Hausherr created TIKA-1489:
-------------------------------------

             Summary: PDF Text extraction without permission
                 Key: TIKA-1489
                 URL: https://issues.apache.org/jira/browse/TIKA-1489
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.7
            Reporter: Tilman Hausherr


In TIKA-1442 text extraction from files like 717226.pdf that don't have text 
extraction permission works. The permissions in PDF files are only enforced by 
the application (i.e. PDFBox), i.e. the text information isn't stored 
separately in encrypted form. 

PDFBox ExtractText command line does throw an exception.
So I wonder why TIKA is able to extract text. Either TIKA or the PDFBox call 
used bypasses the permission checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to