On 2007-06-13, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On Jun 13, 1:28 am, Tim Golden <[EMAIL PROTECTED]> wrote:
>> [EMAIL PROTECTED] wrote:
>> > Hi all,
>> > I'm currently using antiword to extract content from MS Word files.
>> > Is there another way to do this without relying on any command prompt
>> > application?
>>
>> Well you haven't given your environment, but is there
>> anything to stop you from controlling Word itself via
>> COM? I'm no Word expert, but looking around, this
>> seems to work:
>>
>> <code>
>> import win32com.client
>> word = win32com.client.Dispatch ("Word.Application")
>> doc = word.Documents.Open ("c:/temp/temp.doc")
>> text = doc.Range ().Text
>>
>> open ("c:/temp/temp.txt", "w").write (text.encode ("UTF-8"))
>> </code>
>>
>> TJG
>
> Tim,
> I'm on Linux (RedHat) so using Word is not an option for me.  Any
> other suggestions?

There is OpenOffice which has a Python API to it (called UNO). But
piping through antiword is probably easier.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to