On Tue, Aug 28, 2012 at 8:42 PM, rusi <rustompm...@gmail.com> wrote: > In summary: > 1. The problem is not on jmf's computer > 2. It is not windows-only > 3. It is not directly related to latin-1 encodable or not > > The only question which is not yet clear is this: > Given a typical string operation that is complexity O(n), in more > detail it is going to be O(a + bn) > If only a is worse going 3.2 to 3.3, it may be a small issue. > If b is worse by even a tiny amount, it is likely to be a significant > regression for some use-cases.
As has been pointed out repeatedly already, this is a microbenchmark. jmf is focusing in one one particular area (string construction) where Python 3.3 happens to be slower than Python 3.2, ignoring the fact that real code usually does lots of things other than building strings, many of which are slower to begin with. In the real-world benchmarks that I've seen, 3.3 is as fast as or faster than 3.2. Here's a much more realistic benchmark that nonetheless still focuses on strings: word counting. Source: http://pastebin.com/RDeDsgPd C:\Users\Ian\Desktop>c:\python32\python -m timeit -s "import wc" "wc.wc('unilang8.htm')" 1000 loops, best of 3: 310 usec per loop C:\Users\Ian\Desktop>c:\python33\python -m timeit -s "import wc" "wc.wc('unilang8.htm')" 1000 loops, best of 3: 302 usec per loop "unilang8.htm" is an arbitrary UTF-8 document containing a broad swath of Unicode characters that I pulled off the web. Even though this program is still mostly string processing, Python 3.3 wins. Of course, that's not really a very good test -- since it reads the file on every pass, it probably spends more time in I/O than it does in actual processing. Let's try it again with prepared string data: C:\Users\Ian\Desktop>c:\python32\python -m timeit -s "import wc; t = open('unilang8.htm', 'r', encoding ='utf-8').read()" "wc.wc_str(t)" 10000 loops, best of 3: 87.3 usec per loop C:\Users\Ian\Desktop>c:\python33\python -m timeit -s "import wc; t = open('unilang8.htm', 'r', encoding ='utf-8').read()" "wc.wc_str(t)" 10000 loops, best of 3: 84.6 usec per loop Nope, 3.3 still wins. And just for the sake of my own curiosity, I decided to try it again using str.split() instead of a StringIO. Since str.split() creates more strings, I expect Python 3.2 might actually win this time. C:\Users\Ian\Desktop>c:\python32\python -m timeit -s "import wc; t = open('unilang8.htm', 'r', encoding ='utf-8').read()" "wc.wc_split(t)" 10000 loops, best of 3: 88 usec per loop C:\Users\Ian\Desktop>c:\python33\python -m timeit -s "import wc; t = open('unilang8.htm', 'r', encoding ='utf-8').read()" "wc.wc_split(t)" 10000 loops, best of 3: 76.5 usec per loop Interestingly, although Python 3.2 performs the splits in about the same time as the StringIO operation, Python 3.3 is significantly *faster* using str.split(), at least on this data set. > So doing some arm-chair thinking (I dont know the code and difficulty > involved): > > Clearly there are 3 string-engines in the python 3 world: > - 3.2 narrow > - 3.2 wide > - 3.3 (flexible) > > How difficult would it be to giving the choice of string engine as a > command-line flag? > This would avoid the nuisance of having two binaries -- narrow and > wide. Quite difficult. Even if we avoid having two or three separate binaries, we would still have separate binary representations of the string structs. It makes the maintainability of the software go down instead of up. > And it would give the python programmer a choice of efficiency > profiles. So instead of having just one test for my Unicode-handling code, I'll now have to run that same test *three times* -- once for each possible string engine option. Choice isn't always a good thing. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list