New submission from Dan Snider <mr.assume.a...@gmail.com>:

At present, the bytecode compiler can generate 512 different unicode 
characters, one for each integral from the range [0-511), 512 being the total 
number of syntactically valid permutations of 3 octal digits preceded by a 
backslash. However, this does not match the regex compiler, which raises an 
error regardless of the input type when it encounters an an octal escape 
character with a decimal value greater than 255. On the other hand... the bytes 
literal:

>>> b'\407'

is somehow valid, and can lead to extremely difficult bugs to track down, such 
as this nonsense:

>>> re.compile(b'\407').search(b'\a')
<re.Match object; span=(0, 1), match=b'\x07'>

I propose that the regex parser be augmented, enabling for unicode patterns the 
interpretation of three character octal escapes from the range(256, 512), while 
the bytecode parser be adjusted to match the behavior of the regex parser, 
raising an error for bytes literals > b"\400", rather than truncating the 9th 
bit.

----------
messages: 346246
nosy: bup
priority: normal
severity: normal
status: open
title: octal escapes applied inconsistently throughout the interpreter and lib

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37367>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to