On Fri, 22 May 2009 07:47:49 -0700, walterbyrd wrote: > On May 21, 9:44 pm, "Rhodri James" <rho...@wildebst.demon.co.uk> wrote: > >> Escaping the delimiting quote is the *one* time backslashes have a >> special meaning in raw string literals. > > If that were true, then wouldn't r'\b' be treated as two characters?
It is. >>> len(r'\b') 2 >> This calls re.sub with a pattern string object that contains two >> characters, a backslash followed by an 'n'. This combination *does* >> have a special meaning to the sub function, which does it's own >> translation of the pattern into a single newline character. > > So when do I know when a raw string is treated as a raw string, and when > it's not? You have misunderstood. All strings are strings, but there are different ways to build a string. Raw strings are not different from ordinary strings, they're just a different way to *build* an ordinary string. Here are four ways to make the same string, a backslash followed by a lowercase b: "\\b" # use an ordinary string, and escape the backslash chr(92)+"b" # use the chr() function "\x5cb" # use a hex escape r"\b" # use a raw string, no escaping needed The results you get from all of those (and many, many more!) are the same string object. They're just written differently as source code. Now, in regular expressions, the RE engine expects to see special codes inside the string that have special meanings. For example, backslash followed by lowercase B has a special meaning. So to create a string containing that regex, you can use any of the above (or any of the others). The RE engine doesn't know, and can't know, how you generated the regex. All it sees is a string containing a backslash followed by lowercase-B. But if you forget that Python uses backslash escapes in strings, and just write "\b", then the compiler creates the string chr(8) (BEL), which has no special meaning to the RE engine. -- Steven -- http://mail.python.org/mailman/listinfo/python-list