[yys2000] > I want to compare two PDF or WORD files.
Could you be more precise, please? + Do you only want to compare PDF-PDF or Word-Word? Or do you want to be able to do PDF-Word? + In either case, are you only bothered about the text, or is the formatting significant? + If it's only text, then use whatever method you want to extract the text (antiword, ghostscript, COM automation, xpdf, etc.) and then use the difflib module, or some external diff tool. + If you want a structure/format comparison, you're into quite difficult territory, I believe. It's easy enough to convert a Word Doc to PDF if that were needed but PDFs are notoriously difficult to disentangle, altho' relatively straightforward to build. There's pdftools (http://www.boddie.org.uk/david/Projects/Python/pdftools/) which I can't say I've tried, but even once you've got the document object into Python, I don't imagine it'll be easy to compare. + To do Word-Word comparison, there's more hope on the horizon (if that's the metaphor I want). Word has built-in comparison functionality, and recent versions of TortoiseSVN, for example include a script which will automate Word to do the right thing. Which is, essentially, one doc, and call its .Compare method against the other. TJG ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ -- http://mail.python.org/mailman/listinfo/python-list