On Wed, Oct 30, 2013 at 2:56 AM, Mark Lawrence <breamore...@yahoo.co.uk> wrote: > You've stated above that logically unicode is badly handled by the fsr. You > then provide a trivial timing example. WTF???
His idea of bad handling is "oh how terrible, ASCII and BMP have optimizations". He hates the idea that it could be better in some areas instead of even timings all along. But the FSR actually has some distinct benefits even in the areas he's citing - watch this: >>> import timeit >>> timeit.timeit("a = 'hundred'; 'x' in a") 0.3625614428649451 >>> timeit.timeit("a = 'hundreij'; 'x' in a") 0.6753936603674484 >>> timeit.timeit("a = 'hundred'; 'ģ' in a") 0.25663261671525106 >>> timeit.timeit("a = 'hundreij'; 'ģ' in a") 0.3582399439035271 The first two examples are his examples done on my computer, so you can see how all four figures compare. Note how testing for the presence of a non-Latin1 character in an 8-bit string is very fast. Same goes for testing for non-BMP character in a 16-bit string. The difference gets even larger if the string is longer: >>> timeit.timeit("a = 'hundred'*1000; 'x' in a") 10.083378194714726 >>> timeit.timeit("a = 'hundreij'*1000; 'x' in a") 18.656413035735 >>> timeit.timeit("a = 'hundreij'*1000; 'ģ' in a") 18.436268855399135 >>> timeit.timeit("a = 'hundred'*1000; 'ģ' in a") 2.8308718007456264 Wow! The FSR speeds up searches immensely! It's obviously the best thing since sliced bread! ChrisA -- https://mail.python.org/mailman/listinfo/python-list