On Fri, Jul 26, 2013 at 5:07 AM, <wxjmfa...@gmail.com> wrote: > Let start with a simple string \textemdash or \texttendash > >>>> sys.getsizeof('–') > 40 >>>> sys.getsizeof('a') > 26
Most of the cost is in those two apostrophes, look: >>> sys.getsizeof('a') 26 >>> sys.getsizeof(a) 8 Okay, that's slightly unfair (bonus points: figure out what I did to make this work; there are at least two right answers) but still, look at what an empty string costs: >>> sys.getsizeof('') 25 Or look at the difference between one of these characters and two: >>> sys.getsizeof('aa')-sys.getsizeof('a') 1 >>> sys.getsizeof('––')-sys.getsizeof('–') 2 That's what the characters really cost. The overhead is fixed. It is, in fact, almost completely insignificant. The storage requirement for a non-ASCII, BMP-only string converges to two bytes per character. ChrisA -- http://mail.python.org/mailman/listinfo/python-list