Thanks Steven, Actually i wanted a do text processing for my office where I can view all files in the system and use the first three to give a summary of the document. Instead of having somebody actually entering the summary. Seems there is no one code that can act as convertor across formats, i'll have to check out convertors for individual formats.
Thanks and Regards, Gaurav Agarwal Steven D'Aprano wrote: > On Tue, 04 Jul 2006 06:32:13 -0700, Gaurav Agarwal wrote: > > > Hi, > > > > I wanted a script that can convert any file format (RTF/DOC/HTML/PDF/PS > > etc) to text format. > > RTF, HTML and PS are already text format. > > DOC is a secret, closed proprietary format. It will be a lot of work > reverse-engineering it. Perhaps you should consider using existing tools > that already do it -- see, for example, the word processors Abiword and > OpenOffice. They are open-source, so you can read and learn from their > code. Alternatively, you could try some of the suggestions here: > > http://www.linux.com/article.pl?sid=06/02/22/201247 > > Or you could just run through the .doc file, filtering out binary > characters, and display just the text characters. That's a quick-and-dirty > strategy that might help. > > PDF is (I believe) a compressed, binary format of PS. Perhaps you should > look at the program pdf2ps -- maybe it will help. > > If you explain your needs in a little more detail, perhaps people can give > you answers which are a little more helpful. > > > > -- > Steven. -- http://mail.python.org/mailman/listinfo/python-list