Hi, I am new to python and jumped right into trying to read out (english) text from PDF files.
I tried various libraries (including slate) out there but am running into diverse problems, such as with encoding or buffer too small errors -- deep inside some decompression code. Essentially, i want to extract all text and then do some natural language processing on the text. Is there some sample code available that works together with a clear description of the expected python installatin environment needed. In slate btw, i got the buffer error, it seems i must "guess" the right encoding of the text included in the PDF when opening the file. Still trying to figure out how to get the encoding info out of the PDF ... (if available there) thank you, Daniel -- https://mail.python.org/mailman/listinfo/python-list