On Sep 4, 4:18 pm, Tommy Nordgren <[EMAIL PROTECTED]> wrote: > On Sep 4, 2008, at 9:54 PM, [EMAIL PROTECTED] wrote: > > > > > Hi Everyone, > > > I had previously asked a similar question, > >http://groups.google.com/group/comp.lang.python/browse_thread/thread/... > > > but at that point I was using Windows and now I am using Linux. > > Basically, I have some .doc files that I need to convert into txt > > files encoded in utf-8. However, win32com.client doesn't work in > > Linux. > > > It's been giving me quite a headache all day. Any ideas would be > > greatly appreciated. > > > Best, > > Patrick > > > #Windows Code: > > import glob,os,codecs,shutil,win32com.client > > from win32com.client import Dispatch > > > input = '/home/pwaldo2/work/workbench/current_documents/*.doc' > > input_dir = '/home/pwaldo2/work/workbench/current_documents/' > > outpath = '/home/pwaldo2/work/workbench/current_documents/TXT/' > > > for doc in glob.glob1(input): > > WordApp = Dispatch("Word.Application") > > WordApp.Visible = 1 > > WordApp.Documents.Open(doc) > > WordApp.ActiveDocument.SaveAs(doc,7) > > WordApp.ActiveDocument.Close() > > WordApp.Quit() > > > for doc in glob.glob(input): > > txt_split = os.path.splitext(doc) > > txt_doc = txt_split[0] + '.txt' > > txt_doc_path = os.path.join(outpath,txt_doc) > > doc_path = os.path.join(input_dir,doc) > > shutil.copy(doc_path,txt_doc_path) > > -- > >http://mail.python.org/mailman/listinfo/python-list > > You can do it manually with Open Office. <http://www.openoffice.org/> > A free office suite.
On Debian there is a package called "unoconv"--written in Python--that can do the conversions from the command line. It requires a running instance of Open Office. However, the doc-to-txt conversion of Open Office isn't that good. (It wasn't as good as Word's formatted text converter, last time I used it.) Carl Banks -- http://mail.python.org/mailman/listinfo/python-list