I'm getting bogged down with backslash escaping. I have some text files containing characters with the 8th bit set. These characters are encoded one of two ways: either "=hh" or "\xhh", where "h" represents a hex digit, and "\x" is a literal backslash followed by a lower-case x.
Catching the first case with a regex is simple. But when I try to write a regex to catch the second case, I mess up the escaping. I took at look at http://docs.python.org/howto/regex.html, especially the section titled "The Backslash Plague". I started out trying : d...@dan:~/personal/usenet$ python Python 2.7 (r27:82500, Nov 15 2010, 12:10:23) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> r = re.compile('\\\\x([0-9a-fA-F]{2})') >>> a = "This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0 characters \xefn \xeft." >>> m = r.search(a) >>> m No match. I then followed the advice of the above-mentioned document, and expressed the regex as a raw string: >>> r = re.compile(r'\\x([0-9a-fA-F]{2})') >>> r.search(a) Still no match. I'm obviously missing something. I spent a fair bit of time playing with this over the weekend, and I got nowhere. Now it's time to ask for help. What am I doing wrong here? -- http://mail.python.org/mailman/listinfo/python-list