On Aug 24, 12:22 am, Ian Kelly <ian.g.ke...@gmail.com> wrote: > On Thu, Aug 23, 2012 at 12:33 PM, <wxjmfa...@gmail.com> wrote: > >> >>> sys.getsizeof('a' * 80 * 50) > > >> > 4025 > > >> >>>> sys.getsizeof('a' * 80 * 50 + '•') > > >> > 8040 > > >> This example is still benefiting from shrinking the number of bytes > > >> in half over using 32 bits per character as was the case with Python 3.2: > > >> >>> sys.getsizeof('a' * 80 * 50) > > >> 16032 > > >> >>> sys.getsizeof('a' * 80 * 50 + '•') > > >> 16036 > > > Correct, but how many times does it happen? > > Practically never. > > What are you talking about? Surely it happens the same number of > times that your example happens, since it's the same example. By > dismissing this example as being too infrequent to be of any > importance, you dismiss the validity of your own example as well. > > > In this unicode stuff, I'm fascinated by the obsession > > to solve a problem which is, due to the nature of > > Unicode, unsolvable. > > > For every optimization algorithm, for every code > > point range you can optimize, it is always possible > > to find a case breaking that optimization. > > So what? Similarly, for any generalized data compression algorithm, > it is possible to engineer inputs for which the "compressed" output is > as large as or larger than the original input (this is easy to prove). > Does this mean that compression algorithms are useless? I hardly > think so, as evidenced by the widespread popularity of tools like gzip > and WinZip. > > You seem to be saying that because we cannot pack all Unicode strings > into 1-byte or 2-byte per character representations, we should just > give up and force everybody to use maximum-width representations for > all strings. That is absurd. > > > Sure, it is possible to optimize the unicode usage > > by not using French characters, punctuation, mathematical > > symbols, currency symbols, CJK characters... > > (select undesired characters here:http://www.unicode.org/charts/). > > > In that case, why using unicode? > > (A problematic not specific to Python) > > Obviously, it is because I want to have the *ability* to represent all > those characters in my strings, even if I am not necessarily going to > take advantage of that ability in every single string that I produce. > Not all of the strings I use are going to fit into the 1-byte or > 2-byte per character representation. Fine, whatever -- that's part of > the cost of internationalization. However, *most* of the strings that > I work with (this entire email message, for instance) -- and, I think, > most of the strings that any developer works with (identifiers in the > standard library, for instance) -- will fit into at least the 2-byte > per character representation. Why shackle every string everywhere to > 4 bytes per character when for a majority of them we can do much > better than that?
Actually what exactly are you (jmf) asking for? Its not clear to anybody as best as we can see... -- http://mail.python.org/mailman/listinfo/python-list