On Thu, Nov 2, 2017 at 7:34 AM, Ned Batchelder <n...@nedbatchelder.com> wrote: > On 11/1/17 4:17 PM, MRAB wrote: >> >> On 2017-11-01 19:26, Ned Batchelder wrote: >>> >>> From David Beazley >>> (https://twitter.com/dabeaz/status/925787482515533830): >>> >>> >>> a = 'n' >>> >>> b = 'ñ' >>> >>> sys.getsizeof(a) >>> 50 >>> >>> sys.getsizeof(b) >>> 74 >>> >>> float(b) >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> ValueError: could not convert string to float: 'ñ' >>> >>> sys.getsizeof(b) >>> 77 >>> >>> Huh? >>> >> It's all explained in PEP 393. >> >> It's creating an additional representation (UTF-8 + zero-byte terminator) >> of the value and is caching that, so there'll then be the bytes for 'ñ' and >> the bytes for the UTF-8 (0xC3 0xB1 0x00). >> >> When the string is ASCII, the bytes of the UTF-8 representation is >> identical to those or the original string, so it can share them. > > > That explains why b is larger than a to begin with, but it doesn't explain > why float(b) is changing the size of b.
b doesn't initially even _have_ a UTF-8 representation. When float() tries to parse the string, it asks for the UTF-8 form, and that form gets saved into the string object in case it's needed later. ChrisA -- https://mail.python.org/mailman/listinfo/python-list