That KB document was really helpful, but the problem still isn't
solved. What's wierd now is that the unicode characters like
become è in some odd conversion. However, I noticed when I try to
open the word documents after I run the first for statement that Word
gives me a window that says File
En Sun, 21 Oct 2007 15:32:57 -0300, <[EMAIL PROTECTED]> escribi�:
> However, I still cannot read the unicode from the Word file. If take
> out the first for-statement, I get a bunch of garbled text, which
> isn't helpful. I would save them all manually, but I want to figure
> out how to do it in
Indeed, the shutil.copyfile(doc,txt_doc) was causing the problem for
the reason you stated. So, I changed it to this:
for doc in glob.glob(input):
txt_split = os.path.splitext(doc)
txt_doc = txt_split[0] + '.txt'
txt_doc_dir = os.path.join(input_dir,txt_doc)
doc_dir = os.path.join
En Sun, 21 Oct 2007 13:35:43 -0300, <[EMAIL PROTECTED]> escribi�:
> Hi all,
>
> I'm trying to copy a bunch of microsoft word documents that have
> unicode characters into utf-8 text files. Everything works fine at
> the beginning. The word documents get converted and new utf-8 text
> files with
Hi all,
I'm trying to copy a bunch of microsoft word documents that have
unicode characters into utf-8 text files. Everything works fine at
the beginning. The word documents get converted and new utf-8 text
files with the same name get created. And then I try to copy the data
and I keep on gett