Daniel Gross <gross...@gmail.com> writes:
> I am new to python and jumped right into trying to read out (english) text
> from PDF files.
>
> I tried various libraries (including slate)

You could give "pdfminer" a try.

Note, however, that it may not be possible to extract the text:
PDF is a generic format which works by mapping character codes to glyphs
(i.e. visual symbols); if your PDF uses a special map for this
(especially with non standard glyph collections (aka "font"s)),
then the text extraction (which in fact extracts sequences
of character codes) can give unusable results.

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to