Tim Golden wrote: > + PDF: David Boddie's pdftools looks like about the only possibility: > (ducks as a thousand people jump on him and point out the alternatives)
I might as well do that! Here are a couple of alternatives: http://www.sourceforge.net/projects/pdfplayground http://www.adaptive-enterprises.com.au/~d/software/pdffile/ Both of these are arguably more "Pythonic" than my solution, and the first is also able to write out modified files. Cameron Laird also maintains a page about PDF conversion tools: http://phaseit.net/claird/comp.text.pdf/PDF_converters.html > http://www.boddie.org.uk/david/Projects/Python/pdftools/ > > Something like this might do the business. I'm afraid I've > no idea how to determine where the line-breaks are. This > was the first time I'd used pdftools, and the fact that > I could do this much is a credit to its usability! Thanks for the compliment! The read_text method in the PDFContents class also lets you extract text from a given page in a document, but you have to remember that text in PDF files isn't always composed as a series of lines or paragraphs, and often doesn't even contain whitespace characters. David -- http://mail.python.org/mailman/listinfo/python-list