Jeffrey Kintscher <websur...@surf2c.net> added the comment:
Here is the problematic code in _PyBytes_DecodeEscape in Objects/bytesobject.c: c = s[-1] - '0'; if (s < end && '0' <= *s && *s <= '7') { c = (c<<3) + *s++ - '0'; if (s < end && '0' <= *s && *s <= '7') c = (c<<3) + *s++ - '0'; } *p++ = c; c is an int, and p is a char pointer to the new bytes object's string buffer. For b'\407', c gets correctly calculated as 263 (0x107), but the upper bits are lost when it gets recast as a char and stored in the location pointed to by p. Hence, b'\407' becomes b'\x07' when the object is created. IMO, this should raise "ValueError: bytes must be in range(0, 256)" instead of silently throwing away the upper bits. I will work on a PR. I also took a look at how escaped hex values are handled by the same function. It may seem at first glance that >>> b'\x107' b'\x107' is returning the hex value 0x107, but in reality it is returning '\x10' as the first character and '7' as the second character. While visually misleading, it is syntactically and semantically correct. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37367> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com