On 2007-06-13, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > On Jun 13, 1:28 am, Tim Golden <[EMAIL PROTECTED]> wrote: >> [EMAIL PROTECTED] wrote: >> > Hi all, >> > I'm currently using antiword to extract content from MS Word files. >> > Is there another way to do this without relying on any command prompt >> > application? >> >> Well you haven't given your environment, but is there >> anything to stop you from controlling Word itself via >> COM? I'm no Word expert, but looking around, this >> seems to work: >> >> <code> >> import win32com.client >> word = win32com.client.Dispatch ("Word.Application") >> doc = word.Documents.Open ("c:/temp/temp.doc") >> text = doc.Range ().Text >> >> open ("c:/temp/temp.txt", "w").write (text.encode ("UTF-8")) >> </code> >> >> TJG > > Tim, > I'm on Linux (RedHat) so using Word is not an option for me. Any > other suggestions?
There is OpenOffice which has a Python API to it (called UNO). But piping through antiword is probably easier. -- http://mail.python.org/mailman/listinfo/python-list