New submission from Dan Snider <mr.assume.a...@gmail.com>:
At present, the bytecode compiler can generate 512 different unicode characters, one for each integral from the range [0-511), 512 being the total number of syntactically valid permutations of 3 octal digits preceded by a backslash. However, this does not match the regex compiler, which raises an error regardless of the input type when it encounters an an octal escape character with a decimal value greater than 255. On the other hand... the bytes literal: >>> b'\407' is somehow valid, and can lead to extremely difficult bugs to track down, such as this nonsense: >>> re.compile(b'\407').search(b'\a') <re.Match object; span=(0, 1), match=b'\x07'> I propose that the regex parser be augmented, enabling for unicode patterns the interpretation of three character octal escapes from the range(256, 512), while the bytecode parser be adjusted to match the behavior of the regex parser, raising an error for bytes literals > b"\400", rather than truncating the 9th bit. ---------- messages: 346246 nosy: bup priority: normal severity: normal status: open title: octal escapes applied inconsistently throughout the interpreter and lib _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37367> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com