On 2014-01-07 17:46, Andrew Barnert wrote: > I think Stephen's name "7-bit" is confusing people. If you try to > interpret the name sensibly, you get Steven's broken interpretation. > But if you read it as a nonsense word and work through the logic, it > all makes sense. > > On Jan 7, 2014, at 7:44, Steven D'Aprano <st...@pearwood.info> wrote: > I was thinking about Ethan's suggestion of introducing a new bytestring class and a lot of these suggestions are what I thought the bytestring class could do.
[snip] >> >> Suppose we take a pure-ASCII byte-string and decode it: >> >> b'abcd'.decode('ascii-compatible') >> That would be: bytestring(b'abcd') or even: bytestring('abcd') [snip] > >> Suppose we take a byte-string with a non-ASCII byte: >> >> b'abc\xFF'.decode('ascii-compatible') >> That would be: bytestring(b'abc\xFF') Bytes outside the ASCII range would be mapped to Unicode low surrogates: bytestring(b'abc\xFF') == bytestring('abc\uDCFF') [snip] >> Presumably they will compare equal, yes? > > I would hope not. One of them has the Unicode character U+FF, the > other has smuggled byte 0xFF, so they'd better not compare equal. > > However, the latter should compare equal to 'abc\uDCFF'. That's the > entire key here: the new representation is nothing but a more compact > way to represent strings that contain nothing but ASCII and surrogate > escapes. > [snip] >> >> A concrete example: >> >> s = b'abcd'.decode('ascii-compatible') >> t = 'x' # ASCII-compatible >> s + t >> => returns 'abcdx', with the "7-bit repr" flag cleared. s = bytestring(b'abcd') t = 'x' # ASCII-compatible s + t => returns 'abcdx' > > Right. Here both s and t are normal 8-bit strings reprs in the first > place, so the new logic doesn't even get invoked. So yes, that's what > it returns. > >> s = b'abcd'.decode('ascii-compatible') >> t = 'ÿ' # U+00FF, non-ASCII. >> >> s + t >> => returns 'abcd\uDCFF', with the "7-bit repr" flag set s = bytestring(b'abcd') t = 'ÿ' # U+00FF, non-ASCII. s + t => returns 'abcd\xFF' [snip] There were also some other equivalences I was considering: bytestring(b'abc') == b'abc' bytestring(b'abc') == 'abc' The problem there is that it wouldn't be transitive because: b'abc' != 'abc' -- https://mail.python.org/mailman/listinfo/python-list