Re: reading text in pdf, some working sample code

dieter Tue, 21 Nov 2017 23:41:11 -0800

Daniel Gross <[email protected]> writes:
> I am new to python and jumped right into trying to read out (english) text
> from PDF files.
>
> I tried various libraries (including slate)


You could give "pdfminer" a try.

Note, however, that it may not be possible to extract the text:
PDF is a generic format which works by mapping character codes to glyphs
(i.e. visual symbols); if your PDF uses a special map for this
(especially with non standard glyph collections (aka "font"s)),
then the text extraction (which in fact extracts sequences
of character codes) can give unusable results.

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: reading text in pdf, some working sample code

Reply via email to