I suspect you will have to process those formats separately. But the good news, at least for doc files, is that there is a script in the Python Cookbook 2Ed that does what you want for MS Word docs and another script that does it for Open Office docs.
The scripts are 2.26 and 2.27 pages 101-102. I think you can probably find them at the ActiveState repository also. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/279003 In the book, the title of the script is "Extracting Text from Microsoft Word Documents" It uses PyWin32 extension and COM to perform the conversion. rd -- http://mail.python.org/mailman/listinfo/python-list