On Sep 25, 3:02 pm, Paul Hankin <[EMAIL PROTECTED]> wrote: > Googling for 'pdf to text python' and following the first link > giveshttp://pybrary.net/pyPdf/
Doesn't work that well, I've tried it, you should too... the author even admits this: extractText() [#] Locate all text drawing commands, in the order they are provided in the content stream, and extract the text. This works well for some PDF files, but poorly for others, depending on the generator used. This will be refined in the future. Do not rely on the order of text coming out of this function, as it will change if this function is made more sophisticated. - source http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html -- http://mail.python.org/mailman/listinfo/python-list