Jason Tackaberry <tack <at> urandom.ca> writes: > On Wed, 2009-08-05 at 16:43 +0200, Michael Ströder wrote: > > These both expressions are equivalent but which is faster or should be used > > for any reason? > > u = unicode(s,'utf-8') > > u = s.decode('utf-8') # looks nicer > > It is sometimes non-obvious which constructs are faster than others in > Python. I also regularly have these questions, but it's pretty easy to > run quick (albeit naive) benchmarks to see. > > The first thing to try is to have a look at the bytecode for each: [snip] > The presence of LOAD_ATTR in the first form hints that this is probably > going to be slower. Next, actually try it: > > >>> import timeit > >>> timeit.timeit('"foobarbaz".decode("utf-8")') > 1.698289155960083 > >>> timeit.timeit('unicode("foobarbaz", "utf-8")') > 0.53305888175964355 > > So indeed, uncode(s, 'utf-8') is faster by a fair margin.
Faster by an enormous margin; attributing this to the cost of attribute lookup seems implausible. Suggested further avenues of investigation: (1) Try the timing again with "cp1252" and "utf8" and "utf_8" (2) grep "utf-8" <Python2.X_source_code>/Objects/unicodeobject.c HTH, John -- http://mail.python.org/mailman/listinfo/python-list