Alan G Isaac wrote:
On 12/17/2009 2:45 PM, MRAB wrote:
re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').
However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').
However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.
OK, this is helpful.
(I did check equality but did not understand
I got True only because re used caching.)
So is the bottom line the following?
A string replacement is not just "converted"
as described in the documentation, essentially
it is compiled?
But that cannot quite be right. E.g., \b will be a back
space not a word boundary. So then the question arises
again, why isn't '\\' a backslash? Just because?
Why does it not get the "obvious" conversion?
If you give the re module a string containing \b, eg. '\\b' or r'\b',
then it will compile it to a word boundary if it's in a regex string or
a backspace if it's in a replacement string. This is different from
giving the re module a string which actually contains a backspace, eg,
'\b'.
Because the re module uses backslashes for escaping, you'll need to
escape a literal backslash with a backslash in the string you give it.
But string literals also use backslashes for escaping, so you'll need to
escape each of those backslashes with a backslash.
--
http://mail.python.org/mailman/listinfo/python-list