Thorsten Kampe <thors...@thorstenkampe.de> wrote: > * Steven D'Aprano (06 Aug 2009 19:17:30 GMT) >> What if you're writing a loop which takes one million different lines of >> text and decodes them once each? >> >> >>> setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]' >> >>> t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup) >> >>> t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup) >> >>> t1.timeit(number=1) >> 5.6751680374145508 >> >>> t2.timeit(number=1) >> 2.6822888851165771 >> >> Seems like a pretty meaningful difference to me. > > Bollocks. No one will even notice whether a code sequence runs 2.7 or > 5.7 seconds. That's completely artificial benchmarking. >
For a real-life example, I have often a file with one word per line, and I run python scripts to apply some (sometimes fairy trivial) transformation over it. REAL example, reading lines with word, lemma, tag separated by tabs from stdin and writing word into stdout, unless it starts with '<' (~6e5 lines, python2.5, user times, warm cache, I hope the comments are self-explanatory) no unicode user 0m2.380s decode('utf-8'), encode('utf-8') user 0m3.560s sys.stdout = codecs.getwriter('utf-8')(sys.stdout);sys.stdin = codecs.getreader('utf-8')(sys.stdin) user 0m6.180s unicode(line, 'utf8'), encode('utf-8') user 0m3.820s unicode(line, 'utf-8'), encode('utf-8') user 0m2.880sa python3.1 user 0m1.560s Since I have something like 18 million words in my currenct project (and > 600 million overall) and I often tweak some parameters and re-run the > transformations, the differences are pretty significant. Personally, I have been surprised by: 1) bad performance of the codecs wrapper (I expected it to be on par with unicode(x,'utf-8'), mayble slightly better due to less function calls 2) good performance of python3.1 (utf-8 locale) -- ----------------------------------------------------------- | Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ | | __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread! -- http://mail.python.org/mailman/listinfo/python-list