Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-14 Thread scottcabit
On Tuesday, May 13, 2014 4:26:51 PM UTC-4, MRAB wrote: > > 0x96 is a hexadecimal literal for an int. Within a string you need \x96 > > (it's \x for 2 hex digits, \u for 4 hex digits, \U for 8 hex digits). Yes, that was my problem. Figured it out just after posting my last message. using \x96

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-13 Thread scottcabit
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote: > > You may have missed my follow up post, where I said I had not noticed you > were operating on a binary .doc file. > > If you're not willing or able to use a full-blown doc parser, say by > controlling Word or LibreOffice, the

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-12 Thread scottcabit
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote: > Good: > > > > fStr = re.sub(b'‒', b'-', fStr) > Doesn't work...the document has been verified to contain endash and emdash characters, but this does NOT replace them. > > > Better: > > > > fStr = fStr.replace(b'

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread scottcabit
On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote: > A Word doc (as your subject mentions) is a binary format. There's > the older .doc and the newer .docx (which is actually a .zip file > with a particular content-structure renamed to .docx). > I am using .doc files only.. > > F

Re: Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread scottcabit
> > re.sub _returns_ its result (strings are immutable). Ahhso I tried this for each re.sub fStr = re.sub(b'‒','-',fStr) No errors running it, but it still does nothing. -- https://mail.python.org/mailman/listinfo/python-list

Why isn't my re.sub replacing the contents of my MS Word file?

2014-05-09 Thread scottcabit
Hi, here is a snippet of code that opens a file (fn contains the path\name) and first tried to replace all endash, emdash etc characters with simple dash characters, before doing a search. But the replaces are not having any effect. Obviously a syntax problemwwhat silly thing am I doing