On Wed, Jan 5, 2011 at 4:45 PM, Emile van Sebille <em...@fenx.com> wrote:
> On 1/5/2011 3:12 PM kanth...@woh.rr.com said... > > I want to use Python to find all "\n" terminated >> strings in a PDF file, ideally returning string >> starting addresses. Anyone willing to help? >> > > pdflines = open(r'c:\shared\python_book_01.pdf').readlines() > sps = [0] > for ii in pdflines: sps.append(sps[-1]+len(ii)) > > Emile > > > -- > http://mail.python.org/mailman/listinfo/python-list > Bear in mind that pdf files often have compressed objects in them. If that is the case, then I would recommend opening the pdf in binary mode and figuring out how to deflate the correct objects before doing any searching. PyPDF is a package that might help with this though it could use some updating.
-- http://mail.python.org/mailman/listinfo/python-list