On Tuesday, May 13, 2014 4:26:51 PM UTC-4, MRAB wrote:
>
> 0x96 is a hexadecimal literal for an int. Within a string you need \x96
>
> (it's \x for 2 hex digits, \u for 4 hex digits, \U for 8 hex digits).
Yes, that was my problem. Figured it out just after posting my last message.
using \x96
On Tuesday, May 13, 2014 9:49:12 AM UTC-4, Steven D'Aprano wrote:
>
> You may have missed my follow up post, where I said I had not noticed you
> were operating on a binary .doc file.
>
> If you're not willing or able to use a full-blown doc parser, say by
> controlling Word or LibreOffice, the
On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:
> Good:
>
>
>
> fStr = re.sub(b'‒', b'-', fStr)
>
Doesn't work...the document has been verified to contain endash and emdash
characters, but this does NOT replace them.
>
>
> Better:
>
>
>
> fStr = fStr.replace(b'
On Friday, May 9, 2014 4:09:58 PM UTC-4, Tim Chase wrote:
> A Word doc (as your subject mentions) is a binary format. There's
> the older .doc and the newer .docx (which is actually a .zip file
> with a particular content-structure renamed to .docx).
>
I am using .doc files only..
>
> F
>
> re.sub _returns_ its result (strings are immutable).
Ahhso I tried this for each re.sub
fStr = re.sub(b'‒','-',fStr)
No errors running it, but it still does nothing.
--
https://mail.python.org/mailman/listinfo/python-list
Hi,
here is a snippet of code that opens a file (fn contains the path\name) and
first tried to replace all endash, emdash etc characters with simple dash
characters, before doing a search.
But the replaces are not having any effect. Obviously a syntax
problemwwhat silly thing am I doing