Jeffrey Kintscher <websur...@surf2c.net> added the comment:

Here is the problematic code in _PyBytes_DecodeEscape in Objects/bytesobject.c:

            c = s[-1] - '0';
            if (s < end && '0' <= *s && *s <= '7') {
                c = (c<<3) + *s++ - '0';
                if (s < end && '0' <= *s && *s <= '7')
                    c = (c<<3) + *s++ - '0';
            }
            *p++ = c;

c is an int, and p is a char pointer to the new bytes object's string buffer.  
For b'\407', c gets correctly calculated as 263 (0x107), but the upper bits are 
lost when it gets recast as a char and stored in the location pointed to by p.  
Hence, b'\407' becomes b'\x07' when the object is created.

IMO, this should raise "ValueError: bytes must be in range(0, 256)" instead of 
silently throwing away the upper bits.  I will work on a PR.

I also took a look at how escaped hex values are handled by the same function.  
It may seem at first glance that

>>> b'\x107'
b'\x107'

is returning the hex value 0x107, but in reality it is returning '\x10' as the 
first character and '7' as the second character.  While visually misleading, it 
is syntactically and semantically correct.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37367>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to