On Aug 28, 4:57 am, Neil Hodgson <nhodg...@iinet.net.au> wrote: > wxjmfa...@gmail.com: > > > Go "has" the integers int32 and int64. A rune ensure > > the usage of int32. "Text libs" use runes. Go has only > > bytes and runes. > > Go's text libraries use UTF-8 encoded byte strings. Not arrays of > runes. See, for example,http://golang.org/pkg/regexp/ > > Are you claiming that UTF-8 is the optimum string representation and > therefore should be used by Python? > > Neil
This whole rune/go business is a red-herring. In the other thread Peter Otten wrote: > wxjmfa...@gmail.com wrote: > > By chance and luckily, first attempt. > > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€' > > , 'œ')" > > 1000000 loops, best of 3: 1.48 usec per loop > > c:\python33\python -m timeit "('€'*100+'€'*100).replace('€' > > , 'œ')" > > 100000 loops, best of 3: 7.62 usec per loop > > OK, that is roughly factor 5. Let's see what I get: > > $ python3.2 -m timeit '("€"*100+"€"*100).replace("€", "œ")' > 100000 loops, best of 3: 1.8 usec per loop > $ python3.3 -m timeit '("€"*100+"€"*100).replace("€", "œ")' > 10000 loops, best of 3: 9.11 usec per loop > > That is factor 5, too. So I can replicate your measurement on an AMD64 Linux > system with self-built 3.3 versus system 3.2. > > > Note > > The used characters are not members of the latin-1 coding > > scheme (btw an *unusable* coding). > > They are however charaters in cp1252 and mac-roman. > > You seem to imply that the slowdown is connected to the inability of latin-1 > to encode "œ" and "€" (to take the examples relevant to the above > microbench). So let's repeat with latin-1 characters: > > $ python3.2 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")' > 100000 loops, best of 3: 1.76 usec per loop > $ python3.3 -m timeit '("ä"*100+"ä"*100).replace("ä", "ß")' > 10000 loops, best of 3: 10.3 usec per loop > > Hm, the slowdown is even a tad bigger. So we can safely dismiss your theory > that an unfortunate choice of the 8 bit encoding is causing it. Do you In summary: 1. The problem is not on jmf's computer 2. It is not windows-only 3. It is not directly related to latin-1 encodable or not The only question which is not yet clear is this: Given a typical string operation that is complexity O(n), in more detail it is going to be O(a + bn) If only a is worse going 3.2 to 3.3, it may be a small issue. If b is worse by even a tiny amount, it is likely to be a significant regression for some use-cases. So doing some arm-chair thinking (I dont know the code and difficulty involved): Clearly there are 3 string-engines in the python 3 world: - 3.2 narrow - 3.2 wide - 3.3 (flexible) How difficult would it be to giving the choice of string engine as a command-line flag? This would avoid the nuisance of having two binaries -- narrow and wide. And it would give the python programmer a choice of efficiency profiles. -- http://mail.python.org/mailman/listinfo/python-list