On Sat, 2012-01-28 at 21:59 -0800, Chris Rebert wrote: > On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <clt...@gmail.com> wrote: > > I have a very large Adobe PDF file. I was hoping to use a script to > > extract the information for it. Is there a way to loop through a PDF > > file using Python? > Haven't used it myself, but: > http://www.unixuser.org/~euske/python/pdfminer/
It is very prone to hanging and/or crashing. I haven't yet found a really reliably way to read text from a PDF. PyPDF provides a PdfFileReader class with an extractText method. The output is indeed the text although it can be a bit thorny to look at. > > Or do I need to find a way to convert a PDF file into a text file? If > > so how? > The pdf2txt.py script from the same package happens to do exactly this. -- System & Network Administrator [ LPI & NCLA ] <http://www.whitemiceconsulting.com> OpenGroupware Developer <http://www.opengroupware.us> Adam Tauno Williams -- http://mail.python.org/mailman/listinfo/python-list