Axel Straschil <[EMAIL PROTECTED]> writes: > Hallo! > >> However, our company's product, PDFTextStream does do a phenomenal >> job of extracting text and metadata out of PDF documents. It's >> crazy-fast, has a clean API, and in general gets the job done very >> nicely. It presents two points of compromise from your idea >> situation: >> 1. It only produces text, so you would have to take the text it >> provides and write it out as an RTF yourself (there are tons of >> packages and tools that do this). Since the RTF format has pretty >> weak formatting capabilities compared > > I've got the Input Source in HTML, the Problem ist converting from any > to RTF. Please give me a hint where the tons of packages are.
That's easy. Load the HTML in MS Word, and save it as RTF. Script it via COM using the python win32all (I think that's what it's now called) package. <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list