I was trying to use Pypdf following a recipe from the Activestate cookbooks. However I cannot get it too work. Unsure if it is me or it is beacuse sets are deprecated.
I have placed a pdf in my C:\ drive. it is called "Components-of-Dot- NET.pdf" You could use anything I was just testing with it. I was using the last script on that page that was most recently updated. I am using python 2.6. http://code.activestate.com/recipes/511465-pure-python-pdf-to-text-converter/ import pyPdf def getPDFContent(path): content = "C:\Components-of-Dot-NET.pdf" # Load PDF into pyPDF pdf = pyPdf.PdfFileReader(file(path, "rb")) # Iterate pages for i in range(0, pdf.getNumPages()): # Extract text from page and add to content content += pdf.getPage(i).extractText() + "\n" # Collapse whitespace content = " ".join(content.replace(u"\xa0", " ").strip().split()) return content print getPDFContent("Components-of-Dot-NET.pdf").encode("ascii", "ignore") This is my error. >>> Warning (from warnings module): File "C:\Documents and Settings\Family\Application Data\Python \Python26\site-packages\pyPdf\pdf.py", line 52 from sets import ImmutableSet DeprecationWarning: the sets module is deprecated Traceback (most recent call last): File "C:/Python26/Pdfread", line 15, in <module> print getPDFContent("Components-of-Dot-NET.pdf").encode("ascii", "ignore") File "C:/Python26/Pdfread", line 6, in getPDFContent pdf = pyPdf.PdfFileReader(file(path, "rb")) IOError: [Errno 2] No such file or directory: 'Components-of-Dot- NET.pdf' >>> -- http://mail.python.org/mailman/listinfo/python-list