Indeed, the shutil.copyfile(doc,txt_doc) was causing the problem for the reason you stated. So, I changed it to this:
for doc in glob.glob(input): txt_split = os.path.splitext(doc) txt_doc = txt_split[0] + '.txt' txt_doc_dir = os.path.join(input_dir,txt_doc) doc_dir = os.path.join(input_dir,doc) shutil.copy(doc_dir,txt_doc_dir) However, I still cannot read the unicode from the Word file. If take out the first for-statement, I get a bunch of garbled text, which isn't helpful. I would save them all manually, but I want to figure out how to do it in Python, since I'm just beginning. My intuition says the problem is with FileFormat=win32com.client.constants.wdFormatText because it converts fine to a text file, just not a utf-8 text file. How can I modify this or is there another way to code this type of file conversion from *.doc to *.txt with unicode characters? Thanks On Oct 21, 7:02 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Sun, 21 Oct 2007 13:35:43 -0300, <[EMAIL PROTECTED]> escribi?: > > > Hi all, > > > I'm trying to copy a bunch of microsoft word documents that have > > unicode characters into utf-8 text files. Everything works fine at > > the beginning. The word documents get converted and new utf-8 text > > files with the same name get created. And then I try to copy the data > > and I keep on getting "TypeError: coercing to Unicode: need string or > > buffer, instance found". I'm probably copying the word document > > wrong. What can I do? > > Always remember to provide the full traceback. > Where do you get the error? In the last line: shutil.copyfile? > If the file already contains the text in utf-8, and you just want to make > a copy, use shutil.copy as before. > (or, why not tell Word to save the file using the .txt extension in the > first place?) > > > for doc in glob.glob(input): > > txt_split = os.path.splitext(doc) > > txt_doc = txt_split[0] + '.txt' > > txt_doc = codecs.open(txt_doc,'w','utf-8') > > shutil.copyfile(doc,txt_doc) > > copyfile expects path names as arguments, not a > codecs-wrapped-file-like-object > > -- > Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list