reading text in pdf, some working sample code

Daniel Gross Tue, 21 Nov 2017 08:12:12 -0800

Hi,

I am new to python and jumped right into trying to read out (english) text
from PDF files.


I tried various libraries (including slate) out there but am running into
diverse problems, such as with encoding or buffer too small errors -- deep
inside some decompression code.

Essentially, i want to extract all text and then do some natural language
processing on the text. Is there some sample code available that works
together with a clear description of the expected python installatin
environment needed.

In slate btw, i got the buffer error, it seems i must "guess" the right
encoding of the text included in the PDF when opening the file. Still
trying to figure out how to get the encoding info out of the PDF ... (if
available there)

thank you,

Daniel
-- 
https://mail.python.org/mailman/listinfo/python-list

reading text in pdf, some working sample code

Reply via email to