Re: Problem Converting Word to UTF8 Text File

2007-10-22 Thread patrick . waldo
That KB document was really helpful, but the problem still isn't solved. What's wierd now is that the unicode characters like become è in some odd conversion. However, I noticed when I try to open the word documents after I run the first for statement that Word gives me a window that says File

Re: Problem Converting Word to UTF8 Text File

2007-10-21 Thread Gabriel Genellina
En Sun, 21 Oct 2007 15:32:57 -0300, <[EMAIL PROTECTED]> escribi�: > However, I still cannot read the unicode from the Word file. If take > out the first for-statement, I get a bunch of garbled text, which > isn't helpful. I would save them all manually, but I want to figure > out how to do it in

Re: Problem Converting Word to UTF8 Text File

2007-10-21 Thread patrick . waldo
Indeed, the shutil.copyfile(doc,txt_doc) was causing the problem for the reason you stated. So, I changed it to this: for doc in glob.glob(input): txt_split = os.path.splitext(doc) txt_doc = txt_split[0] + '.txt' txt_doc_dir = os.path.join(input_dir,txt_doc) doc_dir = os.path.join

Re: Problem Converting Word to UTF8 Text File

2007-10-21 Thread Gabriel Genellina
En Sun, 21 Oct 2007 13:35:43 -0300, <[EMAIL PROTECTED]> escribi�: > Hi all, > > I'm trying to copy a bunch of microsoft word documents that have > unicode characters into utf-8 text files. Everything works fine at > the beginning. The word documents get converted and new utf-8 text > files with

Problem Converting Word to UTF8 Text File

2007-10-21 Thread patrick . waldo
Hi all, I'm trying to copy a bunch of microsoft word documents that have unicode characters into utf-8 text files. Everything works fine at the beginning. The word documents get converted and new utf-8 text files with the same name get created. And then I try to copy the data and I keep on gett