On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote: > A Word doc (as your subject mentions) is a binary format. There's > the older .doc and the newer .docx (which is actually a .zip file > with a particular content-structure renamed to .docx). > I am using .doc files only......
> > For the older .doc file, it's a binary format, so even if you can > successfully find & swap out sequences of 7 chars for a single char, > it might screw up the internal offsets, breaking your file. I do not save the file out again, only try to change all en-dash and em-dash to dashes, then search and print things to another file, closing the searched file without writing it. > > Additionally, I vaguely remember sparring with them using some 16-bit > wide characters in .doc files so you might have to search for > atrocious things like b"\x00&\x00#\x00x\x002\x000\x001\x002" (each > character being prefixed with "\x00". Hmmm..thought that was what I was doing. Can anyone figure out why the syntax is wrong for Word 2007 document binary file data? -- https://mail.python.org/mailman/listinfo/python-list