Alan G Isaac<alan.is...@gmail.com> wrote:
>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc',
'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
True
Why are the first two strings being treated as if they are the last one?
On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:
They aren't. The last string is different.
Of course it is different.
That is the basis of my question.
Why is it being treated as if it is the same?
(See the end of this post.)
Alan G Isaac<alan.is...@gmail.com> wrote:
More simply, consider::
>>> re.sub('abc', '\\', '123abcdefg')
Traceback (most recent call last):
File "<stdin>", line 1, in<module>
File "C:\Python26\lib\re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python26\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
File "C:\Python26\lib\re.py", line 260, in _compile_repl
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)
Why is this the proper handling of what one might think would be an
obvious substitution?
On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:
Is this what you want? What you have is a re expression consisting of
a single backslash that doesn't escape anything (EOL) so it barfs.
>>>> re.sub('abc', r'\\', '123abcdefg')
> '123\\defg'
Turning again to the documentation:
"if it is a string, any backslash escapes in it are processed.
That is, \n is converted to a single newline character, \r is
converted to a linefeed, and so forth."
So why is '\n' converted to a newline but '\\' does not become a literal
backslash? OK, I don't do much string processing, so perhaps this is where
I am missing the point: how is the replacement being "converted"?
(As Peter's example shows, if you supply the replacement via
a function, this does not happen.) You suggest it is just a matter of
it being an re, but::
>>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd')
True
>>> re.compile('a\\nc') == re.compile('a\nc')
False
So I have two string that are not the same, nor do they compile
equivalently, yet apparently they are "converted" to something
equivalent for the substitution. Why? Is my question clearer?
If the answer looks too obvious to state, assume I'm missing it anyway
and please state it. As I said, I seldom use the re module.
Alan Isaac
--
http://mail.python.org/mailman/listinfo/python-list