On 2 avr, 01:43, Neil Hodgson <nhodg...@iinet.net.au> wrote: > Mark Lawrence: > > > You've given many examples of the same type of micro benchmark, not many > > examples of different types of benchmark. > > Trying to work out what jmfauth is on about I found what appears to > be a performance regression with '<' string comparisons on Windows > 64-bit. Its around 30% slower on a 25 character string that differs in > the last character and 70-100% on a 100 character string that differs at > the end. > > Can someone else please try this to see if its reproducible? Linux > doesn't show this problem. > > >c:\python32\python -u "charwidth.py" > 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] > a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176 > [0.7116295577956576, 0.7055591343157613, 0.7203483026429418] > > a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176 > [0.7664397841378787, 0.7199902325464409, 0.713719289812504] > > a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176 > [0.7341851791817691, 0.6994205901833599, 0.7106807593741005] > > a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']180 > [0.7346812372666784, 0.6995411113377914, 0.7064768417728411] > > >c:\python33\python -u "charwidth.py" > 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit > (AMD64)] > a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108 > [0.9913326076446045, 0.9455845241056282, 0.9459076605341776] > > a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192 > [1.0472289217234318, 1.0362342484091207, 1.0197109728048384] > > a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192 > [1.0439643704533834, 0.9878581050301687, 0.9949265834034335] > > a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']312 > [1.0987483965446412, 1.0130257167690004, 1.024832248526499] > > Here is the code: > > # encoding:utf-8 > import os, sys, timeit > print(sys.version) > examples = [ > "a=['$b','$z']", > "a=['$λ','$η']", > "a=['$b','$η']", > "a=['$\U00020000','$\U00020001']"] > baseDir = "C:/Users/Neil/Documents/" > #~ baseDir = "C:/Users/Neil/Documents/Visual Studio > 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug" > for t in examples: > t = t.replace("$", baseDir) > # Using os.write as simple way get UTF-8 to stdout > os.write(sys.stdout.fileno(), t.encode("utf-8")) > print(sys.getsizeof(t)) > print(timeit.repeat("a[0] < a[1]",t,number=5000000)) > print() > > For a more significant performance difference try replacing the > baseDir setting with (may be wrapped): > baseDir = "C:/Users/Neil/Documents/Visual Studio > 2012/Projects/Sigma/QtReimplementation/HLFKBase/Win32/x64/Debug" > > Neil
-------- Hi, >c:\python32\pythonw -u "charwidth.py" 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchz']168 [0.8343414906182101, 0.8336184057396241, 0.8330473419738562] a=['D:\jm\jmpy\py3app\stringbenchλ','D:\jm\jmpy\py3app \stringbenchη']168 [0.818378092261062, 0.8180854713107406, 0.8192279926793571] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchη']168 [0.8131353330542339, 0.8126985677326912, 0.8122744051977042] a=['D:\jm\jmpy\py3app\stringbenchð €€','D:\jm\jmpy\py3app \stringbenchð €']172 [0.8271094603211102, 0.82704053883214, 0.8265781741004083] >Exit code: 0 >c:\Python33\pythonw -u "charwidth.py" 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchz']94 [1.3840254166697845, 1.3933888932429768, 1.391664674507438] a=['D:\jm\jmpy\py3app\stringbenchλ','D:\jm\jmpy\py3app \stringbenchη']176 [1.6217970707185678, 1.6279369907932706, 1.6207041728220117] a=['D:\jm\jmpy\py3app\stringbenchb','D:\jm\jmpy\py3app \stringbenchη']176 [1.5150522562729396, 1.5130369919353992, 1.5121890607025037] a=['D:\jm\jmpy\py3app\stringbenchð €€','D:\jm\jmpy\py3app \stringbenchð €']316 [1.6135375194801664, 1.6117739170366434, 1.6134331526540109] >Exit code: 0 - win7 32-bits - The file is in utf-8 - Do not be afraid by this output, it is just a copy/paste for your excellent editor, the coding output pane is configured to use the locale coding. - Of course and as expected, similar behaviour from a console. (Which btw show, how good is you application). ========== Something different. From a previous msg, on this thread. --- > Sure. And over a different set of samples, it is less compact. If you > write a lot of Latin-1, Python will use one byte per character, while > UTF-8 will use two bytes per character. I think you mean writing a lot of Latin-1 characters outside ASCII. However, even people writing texts in, say, French will find that only a small proportion of their text is outside ASCII and so the cost of UTF-8 is correspondingly small. The counter-problem is that a French document that needs to include one mathematical symbol (or emoji) outside Latin-1 will double in size as a Python string. --- I already explained this. It is, how to say, a miss-understanding of Unicode. What's count, is not the amount of non-ascii chars you have in a stream. Relevant is the fact that every char is handled with the "same algorithm", in that case utf-8. Unicode takes you from the "char" up to the unicode transformated form. Then it is a question of implementation. This is exactly what you are doing in Scintilla (maybe without realizing this deeply). An editor may reflect very well the example a gave. You enter thousand ascii chars, then - boum - as you enter a non ascii char, your editor (assuming is uses a mechanism like the FSR), has to internally reencode everything! jmf -- http://mail.python.org/mailman/listinfo/python-list