Aloha,
rbt wrote:
Not really a Python question... but here goes: Is there a way to read the content of a PDF file and decode it with Python? I'd like to read PDF's, decode them, and then search the data for certain strings.
First of all, http://groups.google.de/groups?selm=400CF2E3.29506EAE%40netsurf.de&output=gplain still applies here.
If you can deal with a very basic implementation of a pdf-lib you might be interested in http://sourceforge.net/projects/pdfplayground
In the CVS (or the current snapshot) you can find in ppg/Doc/text_extract.txt an example for text extraction.
>>> import pdffile
>>> import pages
>>> import zlib
>>> pf = pdffile.pdffile('../pdf-testset1/a.pdf')
>>> pp = pages.pages(pf)
>>> c = zlib.decompress(pf[pp.pagelist[0]['/Contents']].stream)
>>> op = pdftool.parse_content(c)
>>> sop = [x[1] for x in op if x[0] in ["'", "Tj"]]
>>> for a in sop:
print a[0]Wishing a happy day
LOBI
--
http://mail.python.org/mailman/listinfo/python-list
