Tim Golden wrote:
Shailja Gulati wrote:
Hi ,

I am currently working on "Information retrieval from semi structured Documents" in which there is a need to read data from Resumes.

Could anyone tell me is there any python API to read Word doc?

If you haven't already, get hold of the pywin32 extensions:

 http://pywin32.sf.net

<code>
import win32com.client

doc = win32com.client.GetObject ("c:/temp/temp.doc")
text = doc.Range ().Text

</code>

Note that this will give you a unicode object with \r line-delimiters.
You could read para by para if that were more useful:

<code>
import win32com.client

doc = win32com.client.GetObject ("c:/temp/temp.doc")
lines = [p.Range () for p in doc.Paragraphs]

</code>

TJG
=======================
I saw this right after responding to Kushal's 5:37AM today posting.

Thank you for the tip.  I'll try these first chance I get.
Word, swriter, whatever - I'm not partial when it comes to automating.


Today is: 20090513

Steve
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to