On 2017-11-21, Daniel Gross <gross...@gmail.com> wrote: > I am new to python and jumped right into trying to read out (english) text > from PDF files.
That's not a trivial task. However I just released pycpdf, which might help you out. Check out https://github.com/jribbens/pycpdf which shows an example of extracting text from PDFs. It may or may not cope with the particular PDFs you have, as there's quite a lot of variety within the format. Example: pdf = pycpdf.PDF(open("file.pdf", "rb").read()) if pdf.info and pdf.info.get('Title'): print('Title:', pdf.info['Title']) for pageno, page in enumerate(pdf.pages): print('Page', pageno + 1) print(page.text) -- https://mail.python.org/mailman/listinfo/python-list