On 12/17/2009 11:24 AM, Richard Brodie wrote:
A raw string is not a distinct type from an ordinary string
in the same way byte strings and Unicode strings are. It
is a merely a notation for constants, like writing integers
in hexadecimal.
(r'\n', u'a', 0x16)
('\\n', u'a', 22)
Yes, that was a mistake. But the problem remains::
>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc',
'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
True
>>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
False
Why are the first two strings being treated as if they are the last one?
That is, why isn't '\\' being processed in the obvious way?
This still seems wrong. Why isn't it?
More simply, consider::
>>> re.sub('abc', '\\', '123abcdefg')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python26\lib\re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python26\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
File "C:\Python26\lib\re.py", line 260, in _compile_repl
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)
Why is this the proper handling of what one might think would be an
obvious substitution?
Thanks,
Alan Isaac
--
http://mail.python.org/mailman/listinfo/python-list