On Mon, Jul 15, 2013 at 2:18 PM, Terry Reedy <tjre...@udel.edu> wrote: > On 7/14/2013 10:56 AM, Chris Angelico wrote: > As issue about finding stings in strings was opened last September and, as > reported on this list, fixes were applied about last March. As I remember, > some but not all of the optimizations were applied to 3.3. Perhaps some were > applied too late for 3.3.1 (3.3.2 is 3.3.1 with some emergency patches to > correct regressions).
D'oh. I knew there was something raised and solved regarding that, but I forgot to go check a 3.4 alpha to see if it exhibited the same. Whoops. My bad. Sorry! > Python 3.4.0a2: >>>> import timeit > >>>> timeit.repeat("a = 'hundred'; 'x' in a") > [0.17396483610667152, 0.16277956641670813, 0.1627937074749941] >>>> timeit.repeat("a = 'hundreo'; 'x' in a") > [0.18441108179403187, 0.16277311071618783, 0.16270517215355085] > > The difference is gone, again, as previously reported. Yep, that looks exactly like I would have hoped it would. >> 0.1765129367 ASCII in ASCII, as set > > Much of this time is overhead; 'pass' would not run too much faster. > >> 0.1817367850 SMP in BMP >> 0.1884555160 SMP in ASCII >> 0.2132371572 BMP in ASCII > > For these, 3.3 does no searching because it knows from the internal char > kind that the answer is No without looking. Yeah, I mainly included those results so I could say to jmf "Look, FSR allows some string membership operations to be, I kid you not, as fast as set operations!". >> 0.3137454621 ASCII in ASCII >> 0.4472624314 BMP in BMP >> 0.6672795006 SMP in SMP >> 0.7493052888 ASCII in BMP >> 0.9261783271 ASCII in SMP >> 0.9865787412 BMP in SMP > >> Otherwise, an actual search must be done. Searching >> for characters in strings of the same width gets slower as the strings >> get larger in memory (unsurprising). What I'm seeing of the top-end >> results, though, is that the search for a narrower string in a wider >> one is quite significantly slower. > > 50% longer is not bad, even Hard to give an estimate; my first tests were the ASCII in ASCII and ASCII in BMP, which then looked more like 2:1 time. However, rescaling the needle to BMP makes it more like the 50% you're quoting, so yes, it's not as much as I thought. In any case, the most important thing to note is: 3.4 has already fixed this, ergo jmf should shut up about it. And here I thought I could credit him with a second actually-useful report... ChrisA -- http://mail.python.org/mailman/listinfo/python-list