Re: How to waste computer memory?

BartC Sat, 19 Mar 2016 05:27:07 -0700

On 19/03/2016 11:07, Marko Rauhamaa wrote:

Chris Angelico <ros...@gmail.com>:

On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa <ma...@pacujo.net> wrote:

Unicode made several (understandable but grave) mistakes along the way:

    * normalization


Elaborate please? What's such a big mistake here?


Unicode shouldn't have allowed multiple equivalent variants for a
string.

Now Python falls victim to:

    >>> '\u006e\u0303' == '\u00f1'
    False

<URL: https://en.wikipedia.org/wiki/Unicode_equivalence>:

    For example, the code point U+006E (the Latin lowercase "n") followed
    by U+0303 (the combining tilde "◌̃") is defined by Unicode to be
    canonically equivalent to the single code point U+00F1 (the lowercase
    letter "ñ" of the Spanish alphabet). Therefore, those sequences
    should be displayed in the same manner, should be treated in the same
    way by applications such as alphabetizing names or searching, and may
    be substituted for each other.



So a string that looks like:

"ññññññññññññññññññññññññññññññññññññññññññññññññññ"

can have 2**50 different representations? And occupy somewhere between50 and 200 bytes? Or is that 400?


OK...

--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: How to waste computer memory?

Reply via email to