On 07/25/2013 01:07 PM, wxjmfa...@gmail.com wrote: > Let start with a simple string \textemdash or \texttendash > >>>> sys.getsizeof('–') > 40 >>>> sys.getsizeof('a') > 26
That's meaningless. You're comparing the overhead of a string object itself (a one-time cost anyway), not the overhead of storing the actual characters. This is the only meaningful comparison: >>>> sys.getsizeof('––') - sys.getsizeof('–') >>>> sys.getsizeof('aa') - sys.getsizeof('a') Actually I'm not even sure what your point is after all this time of railing against FSR. You have said in the past that Python penalizes users of character sets that require wider byte encodings, but what would you have us do? use 4-byte characters and penalize everyone equally? Use 2-byte characters that incorrectly expose surrogate pairs for some characters? Use UTF-8 in memory and do O(n) indexing? Are your programs (actual programs, not contrived benchmarks) actually slower because of FSR? Is FSR incorrect? If so, according to what part of the unicode standard? I'm not trying to troll, or feed the troll. I'm actually curious. I think perhaps you feel that many of us who don't use unicode often don't understand unicode because some of us don't understand you. If so, I'm not sure that's actually true. -- http://mail.python.org/mailman/listinfo/python-list